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This  contract  had  two  major  objectives.  The  first  was  to  build,  test, 
and  deliver  to  the  government  an  entry  control  system  using  speaker  veri¬ 
fication  (voice  authentication)  as  the  mechanism  for  verifying  the  user's 
claimed  identity.  This  system  included  a  physical  mantrap,  with  an 
integral  weight  scale  to  prevent  more  than  one  user  from  gaining  access 
with  one  verification  (tailgating).  The  speaker  verification  part  of  the 
entry  control  system  contained  all  the  updates  and  embellishments  to  the 
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lgorithm  that  was  developed  earlier  for  the  BISS  (Base  and  Installation 
Security  System)  system  under  contract  with  the  Electronic  Systems  Divi¬ 
sion  of  the  USAF.  These  updates  were  tested  prior  to  and  during  the 
contract  on  an  operational  system  used  at  Texas  Instruments  in  Dallas, 
Texas,  for  controlling  entry  to  the  Corporate  Information  Center  (CIC) 
Rather  than  update  the  existing  BISS-ASV-ADM  (BISS  -  Automatic  Speaker 
Verification  -  Advanced  Development  Model),  the  complete,  updated 
algorithm  was  provided  to  the  Air  Force  on  a  totally  new  system  of  three 
computers,  which  was  tested  for  six  months  at  Texas  Instruments.  Over 
13,000  accesses  wrre  performed  using  this  system,  with  less  than  1.0%  of 
the  users  being  refused  access  based  on  their  vocal  characteristics 
(0.75%  if  users  limited  to  two  attempts).  Off-line  tests  of  casual 
impostors  yielded  an  error  rate  of  less  than  1.0%  with  an  over  90% 
confidence  level.  This  Voice  Verification  Upgrade  (WU)  system  has  been 
delivered  and  installed  at  Rome  Air  Development  Center,  Griffiss  Air 
Force  Base,  New  York,  and  is  operational  in  the  RADC/IRAA  Laboratory. 

The  second  purpose  of  this  contract  was  the  continued  research  into 
voice  authentication  algorithms  and  entry  control  system  performance. 
Pursuant  to  these  objectives,  the  following  studies  were  performed: 

1.  a  trade-off  study  on  speaker  verification  performance  as  a 
function  of  the  prompting  words, 

2.  a  simulation  of  booth  traffic  for  an  entry  control  system  using 
speaker  verification,  and 

3.  a  study  of  speaker  verification  performance  using  an  LPC -based 
prediction  residual . 

The  last  of  these  three  studies  was  by  far  the  most  extensive,  and  pro¬ 
vided  an  order  of  magnitude  improvement  in  performance,  resulting  in 
performance  exceeding  that  set  as  goals  for  this  contract  (<  1.0%  true 
speaker  rejections  and  0.1%  impostor  acceptances).  Included  in  this 
study  was  the  completion  of  an  on-line,  real-time  demonstration  of  the 
LPC-based  speaker  verification  method  on  the  VAX  11/780  at  the  Speech 
Systems  Research  Laboratory  at  the  Texas  Instruments  facility  in  Dallas. 
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SECTION  I 


INTRODUCTION 


This  final  report  covers  the  sixth  in  a  series  of  programs  under¬ 
taken  by  Texas  Instruments,  under  government  sponsorship,  to  further  de¬ 
velop  speaker  verification  (voice  authentication  [1,2])  technology.  The 
relationships  between  these  six  government-funded  programs  and  internal¬ 
ly  funded  voice  authentication  developments  are  shown  in  Figure  1.  In 
the  first  program  [3]  (SV1) ,  a  promising  high-performance  speaker  verif¬ 

ication  technology  was  developed  and  comprehensively  tested  in  a  labora¬ 
tory  environment,  with  accurate  and  reliable  methods  of  time  registra¬ 
tion  providing  a  major  performance  impact. 

In  the  second  program  [4]  (SV2) ,  operationally  important  problems 

were  solved  to  provide  an  operational  capability  for  applications  such 
as  automatic  entry  control.  Concurrent  with  this  second  program  were: 

The  development  of  an  Advanced  Development  Model  voice 
verification  system  for  the  Base  and  Installation  Security 
Systems  (BISS)  program  under  Electronic  Systems  Division 
sponsorship  [5]  using  a  TI-980  minicomputer.  (This  system 
was  subsequently  tested  by  Mitre  [6]  in  side-by-side  tests 
of  verification  systems  using  handwriting  and  fingerprint.) 

The  installation  of  an  operational,  fully  automated  entry 
control  system,  internally  funded,  to  provide  entry  control 
to  the  Texas  Instruments  Corporate  Information  Center  [7] 

(CIC) ,  also  using  a  TI-980  minicomputer. 

In  the  third  program  [8]  (SV3) ,  advanced  speech  processing  capabil¬ 

ities  were  developed  to  enhance  speaker  verification  effectiveness  and 
extension  of  speaker  verification  technology  was  made  to  other  applica¬ 
tions.  Effort  was  focused  on  two  specific  applications:  speaker  verif¬ 
ication  using  passwords  embedded  in  free  text  and  speaker  identification 
(and  subsequent  verification)  using  spoken  identification  codes  (called 
"Total  Voice"  verif ication) .  Both  of  these  required  major  emphasis  on 
the  development  of  word  recognition  technology  and  the  integration  of 
recognition  and  verification  techniques. 

The  fourth  program  [9]  (Remote  Terminal  SV)  was  a  study  conducted 
to  develop  speaker  verification  techniques  for  use  over  degraded  commun¬ 
ication  channels  —  specifically  telephone  lines.  A  test  of  BISS  type 
speaker  verification  technology  was  performed  on  a  degraded  channel  and 
compensation  techniques  were  then  developed. 

The  fifth  program  [10]  (Total  Voice  SV)  was  the  coalescence  of  the 
Total  Voice  verification  technology  and  the  hardware  of  the  Advanced  De¬ 
velopment  Model  BISS  speaker  verification  system  (then  located  at  RADC) 
culminating  in  the  installation  of  the  Total  Voice  computer  program  on 
the  BISS-SV  system. 
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During  the  time  period  of  the  third,  fourth  and  fifth  contract, 
however,  Texas  Instruments  was  spending  internal  funds  to  improve  the 
voice  verification  algorithms  and  to  develop  a  commercially  available 
entry  control  system  using  voice  authentication.  The  algorithm  develop¬ 
ment  was  done  both  on-line,  using  the  entry  control  system  (ECS)  at  CIC 
and  off-line,  using  a  specially  collected  laboratory  data  set. 

In  early  1977,  Tl's  Digital  Systems  Group  (DSG)  began  development 
of  an  entry  control  system  using  TI  990  minicomputers  (rather  than  the 
TI  980's  used  in  the  older  systems),  with  the  programs  being  written 
primarily  in  Pascal,  with  a  few  990  assembly  language  routines  (rather 
than  Fortran  and  980  assembly  language,  as  before) .  This  entry  control 
system  used  most  of  the  algorithm  improvements  made  during  the  trade-off 
studies  on  the  CIC  system,  with  the  addition  of  more  security  protocol 
(inventory  control,  deactivation  of  users,  time-of-day  security  checks, 
etc.),  improved  report  generation  capabilities,  and  an  improved  (more 
complicated)  interface  to  the  security  personnel. 

However,  the  need  for  such  a  system  to  be  able  to  service  multiple 
entrances,  required  changing  the  system  hardware  architecture,  with  some 
new  limitations  resulting  from  the  introduction  of  the  required  digital 
communication  line  between  the  central  computer  and  the  entry  control 
booth.  (Analog  signals  from  the  speaker  and  to  the  microphone  that  were 
used  when  the  computer  was  colocated  with  the  booth  are  not  appropriate 
for  long  distance  transmission,  as  might  be  required  for  BISS,  for  exam¬ 
ple.)  This  digital  communication  line,  although  operated  at  9600  bps, 
still  provided  a  bottleneck,  resulting  in  somewhat  slower  response  time 
between  the  user's  input  speech  and  a  verified/not  verified  response. 
In  addition,  the  oral  prompting  used  to  direct  the  user  could  no  longer 
be  transferred  over  the  communication  line  without  further  degrading  the 
response  time.  (The  prompts  on  the  older  systems  were  PCM-coded  with 
8,000,  eight-bit  samples  per  second,  thus  requiring  about  8  seconds  to 
transmit  a  one  second  prompt.)  Hence,  some  efficient  digital  encoding 
scheme  was  required  for  the  prompts,  which  would  allow  either  transmis¬ 
sion  of  the  speech  very  efficiently,  or  efficient  storage  at  the  entry 
control  booth  end.  Such  coding,  however,  resulted  in  a  degradation  in 
the  quality  of  the  oral  prompts. 

The  resulting  system  developed  by  DSG  is  shown  in  Figure  2.  Such 
systems  were  delivered  both  to  Allied  General  Nuclear  Services  (AGNS)  in 
August  1979,  and  to  CIC  in  April  1980,  to  replace  the  older,  single 
booth,  980-based  system. 

Although  this  current  contract  (Voice  Verification  Upgrade  [WU]) 
effort  between  TI  and  RADC  was  initially  intended  to  add  the  new  algor¬ 
ithm  improvements  to  the  existing,  980-based,  BISS-ASV-ADM,  by  the  start 
of  the  contract  effort,  it  was  determined  to  be  more  to  the  government's 
benefit  for  Texas  Instruments  to  deliver  an  entirely  new  ASV  system, 
based  on  the  newer,  more  serviceable,  990  minicomputers,  taking  advan¬ 
tage  of  the  development  effort  by  DSG.  However,  in  order  both  to  im¬ 
prove  the  response  time  of  the  system  to  the  user  (by  sharing  the  com¬ 
puting  load  during  a  verification  with  a  more  powerful  centrally  located 
computer)  and  to  reduce  the  cost  of  the  computing  equipment  at  the  entry 
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control  booth  (eventually  replacing  it  with  a  microcomputer) ,  a  modifi¬ 
cation  was  made  to  the  system  architecture  being  used  by  DSG,  resulting 
in  the  system  shown  in  Figure  3.  This  is  the  system  delivered  to  RADC 
for  this  contract,  and  will  be  referred  to  as  the  WU  system  throughout 
the  remainder  of  this  final  report. 

During  late  1980,  DSG  developed  and  delivered  a  three  portal  system 
to  AGNS  that  although  similar  to  the  system  in  Figure  2,  did  not  contain 
any  of  the  portal  control  (the  DSG  host  communicated  to  an  AGNS  central 
security  computer,  which  in  turn  controlled  the  booth  doors  and  sen¬ 
sors)  ,  resulting  in  a  computer  at  the  entry  control  booth  that  only  pro¬ 
vided  a  verification  decision  to  the  host  for  a  given  reference  file 
supplied  by  the  DSG  host.  This  architecture  results  in  a  stand-alone 
computer  that  makes  a  verification  decision  with  an  externally  supplied 
reference  file.  This  also  is  a  viable  method  for  achieving  the  desired 
cost  reduction  in  the  booth  computer,  and  provides  a  method  for  supply¬ 
ing  a  product  to  customers  already  possessing  a  computer-controlled  sec¬ 
urity  system.  However,  due  both  to  timing  and  to  the  BISS  program's  in¬ 
terest  in  a  total  security  system,  no  modification  to  the  system  shown 
in  Figure  3  was  made. 

In  addition  to  the  construction,  evaluation  and  delivery  of  the  WU 
voice  authentication  system,  a  number  of  other  experiments  were  per¬ 
formed:  1)  a  booth  simulation  using  queing  models,  2)  experiments  using 
LPC  residual  energy  for  speaker  verification,  and  3)  a  word  trade-off 
study  to  determine  verification  performance  as  a  function  of  spoken 
word.  All  these  experiments  are  covered  in  this  final  report,  in  addi¬ 
tion  to  a  description  of  the  WU  system  and  the  results  of  its  evalua¬ 
tion. 
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Figure  3  Voice  Verification  Upgrade  (WU)  System 


SECTION  II 


HARDWARE 


The  upgraded  BISS  ADM  ASV  system  is  composed  of  two  primary  parts, 
a  computer  hardware  system  and  an  entry  control  booth.  The 
relationship  of  the  two  is  shown  in  Figure  4.  A  description  of  each 
major  part  of  both  the  computer  system  and  the  entry  control  booth  is 
given  in  the  next  two  subsections. 


A.  CENTRAL  COMPUTER  SYSTEM  HARDWARE 

The  Computer  System  for  the  Upgraded  BISS  ADM  ASV  system  is 
composed  of  a  host  processor,  voice  processor  and  terminal  processor, 
as  shown  in  Figure  5.  The  host  and  voice  processors  are  colocated  in  a 
double  bay  desk  with  the  terminal  processor  located  in  the  equipment 
bay  of  the  Entry  Control  Booth,  which  may  be  located  remotely.  The 
voice  processor  is  connected  via  a  9600  baud  asynchronous  modem  to  the 
terminal  processor.  Terminal  processor  components  will  be  discussed  in 
the  Entry  Control  Booth  section. 

Parts  list  for  the  colocated  Host  and  voice  processors  are  given 
in  Tables  1  and  2. 

TABLE  1.  HOST  PROCESSOR  PARTS  LIST 


ITEM  QTY 

1.  990/10  CPU,  mem  mapping,  13-slot  chassis  1 

programmer  panel,  no  standby  power 

2.  TILINE  error-correcting  memory  subsystem  1 

(384K  bytes) 

3.  DS50  interface  kit  (for  T50  disk  drives)  1 

(includes  15  ft  bus  &  20  ft  daisy  chn  cables) 

4.  Trident  daisy  chain  cable  (15  ft)  1 

5.  Trident  radial  cable  (20  ft)  1 

6.  T50  disk  pack  drive  2 

7.  T50  disk  drive  terminator  1 

8.  Low  boy  console  for  T50  2 

9.  DS50  disk  pack  4 

10.  810  printer  master  kit  (printer,  interface,  1 

cable,  paper  tray  and  manuals) 

11.  Model  810  printer  stand,  without  paper  tray  1 

12.  743  KSR  terminal  1 

13.  743  KSR  interface  kit  (includes  30  ft  cable)  1 

14.  911  VDT  kit  (dual  1920-character  controller,  1 

2  displays  and  keyboards) 

15.  TILINE  interface  kit  1 

16.  Rackmount  equipment  cabinet  1 

(also  houses  co-located  voice  processor) 
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Figure  5  VVll  System  Components 


TABLE  2.  VOICE  PROCESSOR  PARTS  LIST 


ITEM  QTY 

1.  990/12  CPU, 17-slot  chassis,  1 

programmer  panel 

2.  Cache  memory  subsystem  1 

(128K  bytes) 

3.  990  communications  I/F  module  1 

(one  needed  for  each  terminal  processor) 

4.  9600  baud  modem  1 

(one  needed  for  each  terminal  processor) 

5.  Vector  comparator  (nonstandard  part)  1 


B.  ENTRY  CONTROL  BOOTH 

Because  building  an  entry  control  booth  represented  a  significant 
task,  several  vendors  of  prefabricated  booths  were  evaluated.  A  visit 
was  made  to  two  booth  vendors  who  offered  integral  weigh  systems, 
Campbell  Engineering  and  Mardix  Corporation.  Both  potential  vendors 
were  given  a  technical  evaluation.  Sites  were  visited  where  each 
vendor  had  installed  booths  similar  to  the  booth  required  by  this 
contract.  In  addition,  both  vendors  were  evaluated  for  cost  and 
delivery  commitment.  The  following  is  a  summary  of  the  evaluation 
made. 

Both  Campbell  and  Mardix  had  proven  experience  in  building 
security  booths.  Campbell  had  delivered  a  small  number  of  booths  to 
Lawrence  Livermore  Laboratory,  and  Mardix  had  delivered  a  large  number 
of  booths  to  many  different  customers.  Mardix  has  a  product  line  of 
security  entry  systems  that  includes  booths.  Campbell  has  no  product 
line  in  security  entry  systems;  the  booths  they  have  are  a  side 
business.  Campbell  had  delivered  one  (1)  booth  with  am  integral  weigh 
system.  However,  the  Campbell  weigh  system  had  side  loading  problems 
that  seem  to  be  inherent  in  the  design  approach  of  weighing  the  entire 
booth.  Mardix  had  fabricated  a  booth  for  Sandia  Corporation  that 
contained  a  weigh  system  which  weighed  only  the  occupant  area  thereby 
minimizing  side  loading  problems. 

The  Campbell  booth  was  constructed  with  steel  walls  and  steel 
structural  members.  The  Mardix  booth  used  special  plywood-metal 
laminated  walls  that  are  not  easily  penetrated  but  light  enough  for 
portability.  The  framing  in  the  Mardix  booth  used  both  steel  and 
aluminum  structural  members. 

The  cost  of  the  Campbell  booth  was  estimated  to  be  50%  more  than 
the  comparable  Mardix  booth  due  primarily  to  the  following  reasons: 

1.  The  Campbell  booth  had  a  higher  material  cost  than  the 
Mardix  booth. 

2.  The  Mardix  weigh  system  had  to  be  modified  only  slightly, 
while  the  Campbell  weigh  system  required  a  total  redesign. 
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With  regard  to  delivery  schedules,  Campbell  was  willing  to  commit 
to  a  150-day  ARO  schedule.  Mardix  committed  to  a  90-day  ARO  with  a 
submitted  formal  quotation.  Campbell  Engineering  did  not  make  a  formal 
commitment. 

In  conclusion,  both  vendors  met  construction  requirements, 
although  the  Campbell  booth  was  superior  in  both  materials  and 
construction.  However,  since  the  Mardix  design  represented  a 
significantly  lower-risk  approach  with  respect  to  overall  cost  and 
delivery,  it  was  chosen  over  the  Campbell  booth. 

After  delivery  of  the  Mardix  booth,  shown  in  Figure  6,  several 
additions  and  modifications  were  made.  The  additions  included 
installation  of  double  doors  with  an  integral  user  terminal  (see  Figure 
7)  to  separate  the  equipment  bay  from  the  occupant  area,  installation 
of  equipment  racks  behind  the  doors,  and  installation  of  a  light 
fixture,  dropped  ceiling,  and  sound  absorbing  material  to  minimize 
noise  in  the  occupant  area.  Figure  8  illustrates  the  layout  of  the 
equipment  bay  in  the  entry  control  booth.  In  addition,  the  user 
terminal,  mounted  in  the  right  equipment  bay  door,  was  lined  with  sound 
absorbing  material  to  minimize  acoustical  reflections.  The  user 
terminal  has  an  overall  depth  into  the  door  of  10  inches,  extending  2 
inches  into  the  occupant  area  and  recessed  8  inches  into  the 
electronics  bay.  This  design  was  selected  to  impart  a  sense  of  privacy 
to  the  user  and  to  keep  the  microphone  from  obstructing  movement  in  the 
booth.  The  overall  height  of  the  user  terminal  is  45  inches,  with  the 
height  of  the  microphone  base  being  58  inches  from  the  floor  of  the 
booth.  Due  to  microphone  positions  varying  from  30  degrees  below  the 
horizontal  to  60  degrees  above  the  horizontal  and  to  a  10  inch 
microphone  length,  the  58  inch  height  of  the  base  comfortably 
accommodates  from  the  fifth  percentile  of  female  heights  through  the 
95th  percentile  of  male  heights  [11] .  The  keyboard  is  mounted  in  the 
lower  plate  of  the  terminal  with  the  speaker  in  an  enclosure  mounted  to 
the  ceiling.  Below  the  terminal  is  an  opening  for  insertion  of  a  badge 
to  be  read  via  a  closed  circuit  television  (CCTV)  and  to  be  used  in 
conjunction  with  an  overhead  camera  for  monitoring  and  backup. 

The  only  modification  was  to  the  floor  in  the  weighed  occupant 
area.  As  supplied,  the  plywood  floor  was  not  sufficiently  rigid. 
Rather  than  replace  the  floor  with  a  steel  plate  (200-300  lbs  for  an 
adequate  thickness),  it  was  reinforced  with  angle-iron  bracing. 


Table  3  is  a  list  of  the  items  used  in  the  entry  control  booth. 
TABLE  3.  ENTRY  CONTROL  BOOTH  PARTS  LIST 


ITEM  QTY 

1.  Mardix  VG-300  booth/weigh  system  1 

2.  Equipment  bay  doors  with  integral  user  1 

terminal 

3.  BLH  Electronics  LBPl  load  cells  4 

with  a  load  rating  of  250  pounds  each 

4.  Load  cell  summing  box,  BLH  model  308  1 

5.  Anadex  indicator,  DPM-735  with  option  B  1 

6.  Microphone,  Electrovoice  635A  1 

7.  Microphone  base,  plug  and  shields  1 

8.  Speaker  1 

9.  Set  of  6MT  control  modules  for  doors,  etc.  1 

10.  Terminal  Processor  1 

11.  9600  Baud  Asynchronous  Modem  1 

12.  24  Vdc  Power  Supply  with  integral  tester  1 


A  block  diagram  of  the  terminal  processor  and  its  components  is 
shown  in  Figure  9.  As  can  be  seen  in  the  diagram  the  main  components 
are  the  Speech  I/O  and  Analog  Filter  boards.  These  two  hardware 
devices  comprise  the  heart  of  the  system.  A  combined  block  diagram  of 
these  devices  is  shown  in  Figure  10.  The  terminal  processor's  CPU 
interfaces  to  the  Speech  I/O  board  via  its  communication  register  unit 
or  CRU,  a  serial  bus.  Speech  and  control  data  are  sent,  via  the  CRU, 
to  a  TMC0281  speech  synthesizer  chip  for  issuing  user  prompting 
phrases.  Data  for  the  phrases  are  stored  locally  in  the  terminal 
processor  in  volatile  random  access  memory.  The  Speech  I/O  board  also 
controls  the  state  of  the  Analog  Filter  board  by  selecting  the 
appropriate  input  analog  filter  channel  for  conversion  to  digital  data 
and  subsequent  storage  by  the  terminal  processor.  The  primary 
responsibility  of  the  Analog  Filter  board  is  to  amplify  and  separate 
the  incoming  speech  waveform  into  14  different  frequency  bands  using 
active  passband  filters.  The  speech  signal  passes  through  a  two  stage 
active  passband  filter;  it  is  then  rectified  and  passed  through  an 
integrating  filter  section  prior  to  being  digitized  by  the  10-bit 
analog-to-digital  converter  residing  on  the  Speech  I/O  board.  Figure 
11  shows  a  typical  filter  on  the  Analog  Filter  board. 
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The  terminal  processor  contains  the  items  listed  in  Table  4. 
TABLE  4.  TERMINAL  PROCESSOR  PARTS  LIST 


ITEM  QTY 

1.  990/5  CPU  with  64K  bytes,  6-slot  chassis,  1 

programmer  panel,  standby  power 

2.  16  I/O  TTL  data  modules  3 

(for  keyboard,  doors  and  scales) 

3.  990  connunications  I/F  module  1 

4.  12-key  keypad  1 

5.  Keypad  cable  1 

6.  Speech  I/O  to  filter  board  cable  1 

7.  990  to  6MT  control  modules  cable  1 

8.  Analog  filter  board  (nonstandard  part)  1 

9.  Speech  I/O  board  (nonstandard  part)  1 


The  terminal  processor  interfaces  to  the  integral  weigh  system  via 
its  CRU  through  a  standard  16  I/O  TTL  data  module  board.  The  weigh 
system  consists  of  an  ANADEX  Transducer  Indicator  digital  panel  meter, 
MODEL  DPM-735,  a  BLH  Electronics  Model  308  Summing  Box,  and  four  BLH 
Electronics  Model  LBPl  load  cells  rated  at  250  pounds  each.  The  ANADEX 
meter  provides  the  excitation  voltage  for  each  of  the  load  cells.  The 
load  cells  return  a  voltage  level  based  on  their  deflection  to  the 
summing  box  which  adds  the  returned  voltages  from  each  load  cell  to 
give  a  single  "summed"  voltage.  This  summed  voltage  goes  to  the  ANADEX 
meter  for  conversion  to  digital  BCD  data  and  ultimate  transmission  to 
the  terminal  processor.  Interface  to  the  keyboard  in  the  user's 
terminal  is  done  with  a  an  additional  16  I/O  data  module  board. 

Communication  to  the  voice  processor  is  accomplished  via  an 
asynchronous  modem  which  interfaces  to  the  terminal  processor  through  a 
standard  TI  990  connunications  module  board.  The  terminal  processor 
communicates  with  the  communications  board  via  the  CRU. 


C.  BANDPASS  FILTER  DESIGN 

Figure  12  shows  one  stage  of  the  filter  used  on  the  analog  filter 
board.  The  following  equations  can  be  written  from  this  diagram  and 
can  be  used  to  derive  the  transfer  functions  of  the  circuit: 

V0  -  VA  V2  -  VA 

R4  R3 

V0  1 

—  +  Vl  (s  *  Cl  + - )  =  0 

R0  R1 

VI 

—  +  V2  (S  *  C2)  *  0 

R2 
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Figure  12  Two  Pole  Active  Bandpass  Filter 


The  transfer  functions  for  V2/VA  and  for  Vl/VA  are 


VI 

(1  +  R4/R3) 

-s*R2*C2 

* 

VA 

R0*R2*C1*C2 

s*s  +  s/(Rl*Cl)  + 

R4/(R3*R0*R2*C1*C2) 

V2 

(1  +  R4/R3) 

1 

it  — — 

<  i 
>  i 

R0*R2*C1*C2 

S*S  +  S/(Rl*Cl)  + 

R4 / ( R3  *R0  *R2  *C1 *C2 ) 

By  letting 
2 

CP  -  R4/(R3*R0*R2*C1*C2) 

BW  -  1/ (Rl*Cl) 

G  *  R4/R3 

the  transfer  functions  can  be  written  as 


VI 

(1  +  G) 

-S*R2*C2 

VA 

G 

s*s  +  s*BW  +  CF*CF 

V2 

(1  +  G) 

1 

VA 

G 

s*s  +  S*BW  +  CF*CF 

Now, 

letting 

R2*C2  = 

-8 

1/CF ,  G  »  1,  and  Cl  -  C2  *  10  ,  yields 

R1  = 

8 

10  /BW, 

R0  =  R2 

8 

*  10  /CF  and  R3  *  R4.  Hence  by  selecting  the 

desired  center  frequencies  and  bandwidths,  the  appropriate  values  for 
the  R^s  can  be  determined.  The  frequency  response  characteristic  for 
VO,  VI,  V2  and  VI  +  V2  for  a  filter  with  CF  «  350  Hz  and  BW  ■  300  Hz 
is  shown  in  Figure  13. 
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SECTION  III 


SOFTWARE 


A.  HOST  PROCESSOR  SOFTWARE 

The  host  is  a  TI  990/10  minicomputer  that  executes  the  DX10  operat¬ 
ing  system.  The  function  of  the  host  is  to  maintain  the  database  con¬ 
taining  reference  data  and  to  provide  the  operator  interface. 

The  host  also  provides  additional  security  checks.  Each  user  may 
be  assigned  a  security  level.  When  a  user  successfully  voice  verifies, 
the  host  checks  that  the  user  has  a  high  enough  security  level  to  enter 
the  area.  The  host  also  maintains  a  time  record  for  each  user,  and 
users  can  be  assigned  permission  to  enter  an  area  only  during  assigned 
times  of  day. 

The  host  checks  for  permission  to  use  operator  commands  from  the 
system  console.  Only  those  users  assigned  permission  as  operators  can 
issue  commands  to  the  system.  The  operator  can  issue  a  wide  variety  of 
commands  for  maintaining,  updating,  and  interrogating  the  data  base,  and 
for  controlling  the  state  of  the  voice  and  terminal  processors.  Figures 
14  and  15  give  two  examples  of  some  of  the  many  reports  that  may  be  gen¬ 
erated  by  the  operator. 

The  database  is  stored  as  a  set  of  key  indexed  files  on  the  system 
disk.  Voice  data  and  personal  data  are  stored  in  separate  files.  The 
user  ID  is  the  primary  key  used  to  acquire  data  from  these  files. 

The  operator  interface  consists  of  a  set  of  system  command  proce¬ 
dures,  which  are  stored  as  text  files  in  three  directories  on  the  system 
disk.  The  procedures  for  normal  operator  interaction  are  stored  in  the 
directory  ".F0Y3A5JK",  the  menus  are  stored  in  ".PROC",  and  the  system 
maintainence  commands  are  stored  in  " .KM6B34FR" .  These  command  proce¬ 
dures  prompt  for  and  collect  parameters  from  the  operator.  The  parame¬ 
ters  are  then  passed  to  tasks  which  carry  out  the  requests. 

Several  tasks  are  normally  executing  when  the,  voice  system  is  run¬ 
ning.  These  tasks  perform  communication  with  the  voice  processor  and 
database  control.  The  MAPPER  task  performs  message  switching  functions. 
It  receives  messages  from  the  voice  processor  communications  task  and 
sends  them  to  the  appropriate  task  for  further  processing. 

INTASK  and  OUTTASK  perform  communications  with  the  voice  processor. 
OUTTASK  accepts  messages  from  an  intertask  communications  (ITC)  channel 
(see  DX10  manual,  volume  3  [11]  for  an  explanation  of  intertask  communi¬ 
cations)  and  sends  them  to  the  voice  processor.  INTASK  receives  mes¬ 
sages  from  the  voice  processor,  checks  them  for  validity  and  passes  them 
to  MAPPER  for  further  routing. 

GETVERFY  is  the  task  that  accesses  the  database  to  obtain  reference 


data.  When  a  user  enters  his/her  ID  on  the  keyboard  in  the  voice  booth, 
the  ID  is  passed  to  GETVERPY.  GETVERFY  reads  the  reference  data  for 
that  user  and  sends  the  data  to  the  voice  processor  for  use  in  the  ver¬ 
ification  process. 

INTASK,  OUTTASK,  GETVERFY,  and  MAPPER  are  memory  r<  ent  tasks. 
The  response  of  these  tasks  to  events  is  critical  to  the  performance  and 
response  time  of  the  host.  The  tasks  are  therefore  locked  into  memory 
at  all  times  so  that  they  can  respond  immediately  to  events.  The  DXlO 
operating  system  requires  that  memory  resident  tasks  be  installed  in  the 
system  program  file  (.S$PROGA).  The  other  speech  related  tasks  are  in¬ 
stalled  in  a  program  file  SVSYS. DATABASE. VOICEX  . 

1.  THE  HOST  DATA  BASE 

The  data  base  consists  of  the  following  files: 


PERSONAL  FILE 

Contains  data  relating  to  nonsecurity  attributes  of  the  user 
such  as  address,  phone  number,  etc.  Only  accessed  to  generate 
reports . 

ACCESS  CONTROL  FILE 

Contains  enrollment  status  of  user,  console  ability,  and  a  copy 
of  the  user  area  control  record.  This  file  contains  information 
used  to  determine  whether  a  user  has  access  to  an  area.  It  is 
accessed  after  successful  voice  verification  to  provide 
additional  security  and  restrictions  such  as  time  of  day  access 
and  security  levels. 

AREA  CONTROL  FILE 

Contains  a  record  for  each  area  a  user  can  enter.  The  record 

contains  the  time  of  day  a  user  may  enter  an  area  for  each  day 

of  the  week  and  an  authorization  level  for  that  area. 

VERIFICATION  FILE 

Contains  voice  reference  data  and  verification  statistics. 

INVENTORY  FILE 

Contains  a  record  for  each  user  that  is  in  an  area.  The  record 

contains  the  area  the  user  is  in,  the  time  the  user  entered  the 

area,  and  the  time  user  should  leave  the  area. 

SYSTEM  MAP 

Contains  status  and  type  information  for  each  portal  in  the 
system  (only  one  in  the  WU  system)  .  Type  information 
includes  verification,  enrollment  and  area  served. 

LEVEL  FILE 

Contains  security  level  for  each  area  and  maximum  valid  area  number. 


USER  PERFORMANCE  SUMMARY 


JAN  3.  1980 


USER  NAMEt  SMITH,  JOHN 
USER  ID*  000001 

USER  WEIGHT t  180 

ENROLLMENT  DATE .  JUN  21,  1979 

DATE  OF  LAST  SYSTEM  ACCESS* .  NOV  12,  1979 


VOICE  VERIFICATION  STATISTICS 


PHRASES  FOR  SUCCESSFUL  VERIFICATION 

1  PHRASE* .  19 

2  PHRASES .  3 

3  PHRASES* .  0 

4  PHRASES* .  1 

5  PHRASES .  0 

6  PHRASES* .  0 

7  PHRASES* .  0 

TOTAL  SUCCESSFUL  VOICE  VERIFICATIONS* .  23 

UNSUCCESSFUL  VOICE  VERIFICATIONS 

NO  USER  RESPONSE .  5 

VOICE  MISMATCH* .  10 

SECURITY  VIOLATIONS 

USER/AREA  INVENTORY  CONFLICT* .  0 

INACTIVE  USER* .  1 

UNAUTHORIZED  TIME* .  0 

UNAUTHORIZED  LEVEL* .  0 

UNAUTHORIZED  AREA* .  0 

SUCCESSFUL  ACCESS  ATTEMPTS* .  23 

OPERATOR  OVERRIDES* .  12 


NON- VO ICE  VERIFICATION  STATISTICS 


SECURITY  VIOLATIONS 

USER/AREA  INVENTORY  CONFLICT* .  O 

INACTIVE  USER* .  0 

UNAUTHORIZED  TIME* .  0 

UNAUTHORIZED  LEVEL* .  0 

UNAUTHORIZED  AREA* .  0 

SUCCESSFUL  ACCESS  ATTEMPTS* .  5 

OPERATOR  OVERRIDES* .  6 


Figure  14  Sample  User  Performance  Summary 
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PORTAL  PERFORMANCE  SUMMARY 


DEC  5.  1979 
PROCESSOR  PORTAL 


1  1 

00:00  08:00  16:00 

DIRECTION:  FROM  AREA  0  TO  AREA  1  TO  TO  TO  TOTAL 

08:00  16:00  24:00 


0  O  0  0 

0  0  0  0 

0  0  0  0 

0  O  0  O 

O  0  0  o 

0  O  0  0 

0  0  0  0 

0  0  0  0 

0  0  0  0 

O  O  o  0 

O  0  0  0 

O  O  0  0 

0  0  0  0 

O  0  0  0 

0  0  0  0 

0  0  0  0 

0  0  0  0 

0  O  0  0 

0  0  0  o 

00 : 00  08 : 00  1 6 : 00 

DIRECTION:  FROM  AREA  1  TO  AREA  0  TO  TO  TO  TOTAL 

08 : 00  1 6 : 00  24 : 00 


SUCCESSFUL  ACCESS  ATTEMPTS: . 

UNSUCCESSFUL  ACCESS  ATTEMPTS: . 

USER  ID  NOT  ENROLLED: . 

USER/AREA  INVENTORY  CONFLICT: _ 

NO  USER  RESPONSE: . 

VOICE  MISMATCH: . 

SYSTEM  MALFUNCTION: . 

PRIVILEGE  VIOLATION 

INACTIVE  USER . 

UNAUTHORIZED  TIME: . 

UNAUTHORIZED  LEVEL: . 

UNAUTHORIZED  AREA: . 

OPERATOR  OVERRIDES: . . . 

PHRASES  FOR  SUCCESSFUL  VERIFICATION 

1  PHRASE: . 

2  PHRASES . 

3  PHRASES: . 

4  PHRASES: . 

5  PHRASES: . 

6  PHRASES: . 

7  PHRASES: . 


SUCCESSFUL  ACCESS  ATTEMPTS: .  0  0 

UNSUCCESSFUL  ACCESS  ATTEMPTS: .  0  0 

USER  ID  NOT  ENROLLED: .  0  0 

USER/ AREA  INVENTORY  CONFLICT: _  0  0 

NO  USER  RESPONSE: .  0  0 

VOICE  MISMATCH: .  0  0 

SYSTEM  MALFUNCTION: .  0  0 

PRIVILEGE  VIOLATION 

INACTIVE  USER: .  0  0 

UN  AUTHOR  I Z  ED  T I  ME : .  0  O 

UNAUTHORIZED  LEVEL: .  O  O 

UNAUTHORIZED  AREA: .  0  O 

OPERATOR  OVERRIDES: .  O  0 

PHRASES  FOR  SUCCESSFUL  VERIFICATION 

1  PHRASE : .  O  O 

2  PHRASES .  0  0 

3  PHRASES: .  0  O 

4  PHRASES : .  O  0 

5  PHRASES: .  O  O 

6  PHRASES: .  O  0 

7  PHRASES: .  0  O 


0  0 

0  0 

0  0 

O  0 

O  0 

0  0 

0  0 


0  0 
0  0 
0  0 
0  0 
0  0 


o 

o 

0 

0 

t.l 

o 

o 


Figure  15  Sample  Portal  Performance  Summary 
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OPTIONS  FILE 

Contains  option  enable  flags  for  system  behavior. 

SPOOL  FILE 

Contains  system  event  history  similar  to  a  system  log. 


2.  THE  HOST  TASKS 

The  following  tasks  are  provided  in  the  host: 
2.1  DATA  BASE  MAINTENANCE  TASKS 


AREACTRL 

CHGINVT 

CLOSE 

CLRDIAG 

CLRDISK 

CLRRSCH 

COMMENT 

EMITIE2 

EMITPUR 

ENROL ITC 

ENROLLM 

GETVERFY 

HOSTAC 

LOGMSG 

LOGSPOOL 

MSGOPER 

ODD JOB 

OVERRIDE 

PDACCESS 

PDAREA 

PDLOG 

PDMAP 

PDPERSNL 

PDPORTAL 

PDVERFY 

PRNTDISK 

PRNTREF 

RSCHDISK 

SETDIAG 

SETOP 

SETRSCH 

STOPBELL 

TERMAC 

TESTLOG 

TESTRSCH 

TSTMSGOP 

UDACCES 1 

UDACCES2 

UDAREA 

UDMAP 


Maintain  area  control  records 

Manual  change  of  inventory 

Close  KIF  file 

Clear  diagnostic  mode 

Clear  research  file 

Clear  research  mode 

Send  a  comment  to  the  system  log 

Emit  in it  enroll  command 

Emit  process  user  request 

Send  operator  enrollment  response  to  ENROLLM 
Perform  host  function  of  enrollment 
Perform  host  function  verification 
MSGOPER  messages  from  host 
LOGSPOOL  messages 

Log  system  activity  to  printer  and  circular  file 

Prints  alert  messages  and  waits  for  responses 

Abort  enrollment,  zero's  user  performance  statistics 

Sends  override  to  VP;  checks  inventory  for  violations 

Creates  access  control  report 

Creates  users  in  an  area  report 

Creates  system  log  report 

Creates  system  map  report 

Creates  personal  data  report 

Creates  portal  performance  report 

Creates  user  performance  report 

Creates  research  data  report 

Creates  reference  data  report 

Writes  research  mode  file 

Set  diagnostic  mode 

Sets  system  option  record 

Sets  research  mode 

Sends  stop  bell  command  to  MSGOPER 

MSGOPER  messages  from  terminal 

Tests  LOGSPOOL 

Test  host  end  of  research  mode 
Test  MSGOPER 

Enter  and  modify  access  control  file 
Deletes,  changes  user  state  for  access 
Maintains  area  inventories 
Modifies  system  map 
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UDPORTAL  Maintains  portal  performance  statistics 

UDPRSNLl  Enter  and  modify  personal  data 

UDPRSNL2  Modifies  the  data  base  enrollment  status  when  a  user 

is  reenrolled  or  deleted 
USERSUM  Creates  enrolled  user  summary 

USRSTAT  Creates  user's  verification  statistics  report 

ZEROPORT  Zero's  portal  performance  statistics 


2.2  OVERHEAD  TASKS 

ASGNLUNO 
ASSIGN 
DIAGDISK 
DOWNTOAD 
DUMPCHAN 
INDSR 
INTASK 
HEART 
HOSTUPDN 
INPDT 
LOOPBACK 
MAPPER 
OUTDSR 
OUTASK 
OUTPDT 
PURGE 
QUEDSR 
QUEPDT 
SYSERROR 
TIMER 


2 . 3  SECURITY  TASKS 

AUTHVIOL  Generates  users  in  authorization  violation  report 
CATCH  Periodically  checks  inventory  for  users  past  time 

CHEKINV  Checks  a  user  for  inventory  violations 
CHEKSECR  Sends  request  to  perform  a  command  to  SECMONT 
INITLEVL  Loads  area  levels 

LOGIN  Accepts  password 

MTHDAY  Converts  julian  day  to  month  and  day  of  month  form 

PDLEVEL  Creates  area  level  report 

REWIND  Rewind  report  temp  files 

SECMONT  Checks  security  for  verifications  and  commands 

SETSYN  Sets  synonym  to  proc  directory 

STATUS  Generates  status  of  user  report 

TESTSECR  Tests  SECMONT 

UDCONS  Updates  console  privileges 

UDLEVEL  Maintains  area  level  record 


Assigns  lunos 
Assigns  lunos  from  a  file 
Writes  diagnostic  mode  file 
Downloads  VP  object 
Displays  ITC  channels 
Input  DSR 

Input  driver  for  VP 

Sends  and  monitors  heartbeat  messages 
Sends  host  up/down  messages 
PDT  for  INDSR 

Sends  and  receives  loopback  messages  to  a  VP 
Controls  tasks  and  replicates  ITC  messages 
Output  DSR 

Output  driver  for  VP 
PDT  for  OUTDSR 

Remove  ITC  messages  from  a  channel 
Input  queue  DSR 
PDT  for  QUEPDT 

Sends  system  error  message  to  MSGOPER 

Receives  an  ITC  message,  holds  it,  then  resends  it 


2.4  UTILITY  PROCEDURES  USED  BY  VOICE  TASKS 


ABORT 

CALDATE 

CONMSG 

DELTCUR 

DELTKEY 

ENTMSG 

GETDAY 

GETIME 

GETITC 

GETNAME 

IARMSG 

IIDMSG 

INITMAP 

INITOP 

INSRTR 

IOSVC 

JULDATE 

JULDAY 

LOADMAP 

LOADOP 

LOGMSG 

OPEN 

OPENCALL 

OVRMSG 

PKEYREAD 

PRTMSG 

RD$VDT 

PUTITC 

READCUR 

READEQG 

READKEY 

READNEXT 

READPREV 

RERITR 

SENDITC 

SCI$IF 

SETEQG 

SVC  10 

TMEMSG 

UNLOCK 

WR$VDT 


Abort  program 

Julian  to  calendar  date 

Sends  alert  message  to  MSGOPER 

Delete  current  KIF  record 

Delete  a  KIF  record 

Enrollment  terminated  message 

Time  of  day  and  day  of  week 

Get  time  and  day 

Read  an  ITC  channel 

Get  the  name  corresponding  to  an  ID 

Invalid  access  request  message 

Invalid  ID  message 

Load  system  map 

Load  system  options 

Insert  a  KIF  record 

General  I/O  SVC 

Calendar  date  to  julian  date 

Calendar  date  to  julian  date 

Replace  system  map 

Replace  system  options 

Sends  log  message  to  L0GSP00L 

Open  KIF  file 

Open  a  file  and  return  characteristics 
Sends  override  message  to  VP 
Read  KIF  record  by  primary  key 
Update  portal  statistics  message 
Read  VDT 

Send  an  ITC  message 
Read  current  KIF  record 
Read  KIF  record  >= 

Read  KIF  record  by  key 
Read  next  KIF  record 
Read  previous  KIF  record 
Rewrite  KIF  record 
Sends  some  ITC  messages 
Interface  to  SCI 

Set  currency  to  scan  key  indexed  file 

General  I/O  SVC 

Message  to  timer 

Unlock  KIF  record 

Write  to  VDT 


2.5  MAIN  TASKS 
MAPPER 

Controls  task  scheduling  of  voice  tasks,  relays  ITC  messages  to 
tasks,  and  provides  debugging  information.  Almost  all  ITC  mess¬ 
ages  sent  in  the  system  will  pass  through  MAPPER.  GETVERFY  and 
ENROLLM  tasks  send  and  receive  large  amounts  of  reference  data  to 
the  voice  processor.  To  make  this  process  more  efficient,  they 
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communicate  with  INTASK  and  OUTTASK  by  means  of  shared  memory.  A 
message  is  sent  through  MAPPER  to  notify  the  destination  task  that 
data  are  present  in  shared  memory.  The  data  are  transmitted  dir¬ 
ectly  to  the  voice  processor,  avoiding  the  system  overhead  in¬ 
volved  in  intertask  communication. 

GETVERFY 

Maintains  verification  database,  and  monitors  and  controls  the 
verification  process.  Hands  control  to  ENROLLM  in  cases  where  an 
enrollment  might  be  required. 


3 .  HOST  SCENARIOS : 

3.1  VERIFICATION 

1.  A  process  user  request  is  received  by  an  input  DSR  and 
sent  to  MAPPER. 

2.  MAPPER  sends  the  request  to  SECMONT  and  GETVERFY  and  then 
activates  them. 

3.  SECMONT  makes  all  necessary  security  checks  on  the  user 
and  sends  a  command  to  GETVERFY  indicating  the  result.  If 
the  result  is  a  security  rejection,  SECMONT  sends  a  message 
to  LOGSPOOL. 

4.  GETVERFY  reads  the  user^s  verification  file  and  waits  for 
the  command  from  SECMONT.  When  it  arrives,  GETVERFY  sends 

a  response  to  the  voice  processor.  If  voice  verification  is 
required,  the  reference  data  is  sent  to  the  processor  with  the 
response.  A  message  is  sent  to  UDPORTAL  to  update  the  portal 
statistics. 

5.  When  the  verification  is  complete,  the  result  and  update  data 
are  sent  to  GETVERFY.  GETVERFY  updates  the  statistics, 
rewrites  the  verification  record,  and  sends  a  message  to  MSGOPER. 

6.  When  the  door  is  opened  by  the  user,  the  door-opened  message  is 
sent  to  MAPPER  who  sends  it  to  GETVERFY  and  UDAREA.  UDAREA 
updates  the  inventory  and  GETVERFY  makes  certain  all  users  at 
that  portal  have  been  accounted  for  and  sends  the  door-opened 
to  LOGSPOOL. 

7.  LOGSPOOL  receives  messages  indicating  security  check  failure, 
verification  result,  and  a  door  being  opened. 


3 . 2  ENROLLMENT 


1. 


A  process  user  request  is  received  by  an  input  DSR  and  sent  to 
MAPPER. 


2.  MAPPER  sends  the  request  to  SECMONT  and  GETVERFY  and  then 
activates  them. 

3.  SECMONT  will  reject  the  user  because  he  will  not  be  active 
and  will  send  the  result  to  GETVERFY. 

4.  GETVERFY  will  try  to  read  his  verification  record  and  will 
find  that  the  record  does  not  exist.  When  the  command  from 
SECMONT  arrives,  ENROLLM  will  be  sent  an  enrollment  check 
command  from  GETVERFY.  The  command  will  include  an  address 
of  a  verification  record  to  use  for  the  new  user. 

5.  when  the  enrollment  check  command  passes  through  MAPPER,  it 
will  bid  ENROLLM  and  pass  it  the  message. 

6.  ENROLLM  will  see  if  an  enrollment  window  has  been  opened 
for  this  user.  If  it  has  not,  the  enrollment  is  stopped  and 
a  message  sent  to  MSGOPER. 

7.  If  the  window  is  open,  an  alert  message  is  sent  to  MSGOPER 
and  ENROLLM  waits  for  a  response  from  the  operator. 

8.  If  the  response  is  no,  the  enrollment  is  stopped  and  a  message 
is  sent  to  LOGSPOOL. 

9.  If  the  response  is  yes,  the  enrollment  command  is  sent  to 
the  portal  and  ENROLLM  times  out  the  enrollment  process. 

10.  When  the  enrollment  is  finished,  ENROLLM  initializes  a  new 
verification  record  and  sends  a  message  to  GETVERFY 
indicating  completion  of  enrollment. 


3.3  REQUEST  TO  PERFORM  A  COMMAND 

1.  In  the  proc  of  each  command  is  a  call  to  CHEKSECR. 

2.  CHEKSECR  reads  the  password  and  ID  of  the  operator;  the 
ID  of  the  user,  if  any;  and  the  command  number  and  set. 

3.  CHEKSECR  sends  the  information  to  SECMONT  which  makes  the 
authorization  check.  SECMONT  sends  the  information  about 
the  command  and  its  decision  to  LOGSPOOL.  SECMONT  sends  a 
response  to  CHEKSECR  which  assigns  that  value  to  a  synonym. 
The  proc  stops  SCI  if  the  request  is  not  allowed. 


B.  VOICE  PROCESSOR  SOFTWARE 

The  voice  processor  consists  of  a  TI  990/12  minicomputer  with  128 
Kbytes  of  memory  and  communication  interfaces  to  both  the  host  and  ter¬ 
minal  processors.  The  function  of  the  voice  processor  is  to  perform, 
under  control  of  the  host  processor,  verification  and  enrollment  of  in- 
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dividuals  usinq  preprocessed  voice  data,  as  sent  from  the  terminal  pro¬ 
cessor  via  a  9600  baud  asynchronous  serial  communications  channel. 
Although  the  use  of  the  990/12  as  the  voice  processor  and  the  size  of 
its  memory  were  selected  to  provide  the  processing  power  for  servicing 
multiple  terminal  processors  (and  memory  size  for  servicing  two  terminal 
processors) ,  the  software  installed  on  the  delivered  voice  processor  has 
not  been  expanded  to  provide  for  more  than  one  terminal  processor. 

The  voice  processor  functions  are  performed  by  a  set  of  cooperat¬ 
ing,  asynchronous  tasks,  primarily  communicating  with  each  other  via 
dedicated  "intertask  communication  (ITC)"  channels  (dedicated  memory 
areas) .  Each  task  is  allocated  a  portion  of  CPU  time  (50  ms  time-slice) 
by  a  priority  scheduling  algorithm  which  accounts  for  interrupts,  sus¬ 
pended  tasks,  etc.  This  task  scheduling  is  part  of  the  underlying 
"TX  0"  operating  system  that  also  provides  memory  management,  interrupt 
processing,  intertask  communication,  interval  timing,  task  initializa¬ 
tion,  etc.  TX990  is  described  in  more  detail  in  the  "TX990  Operating 
System  Documentation."  [12] 

The  interrelationship  of  the  asynchronous  tasks  on  the  voice  pro¬ 
cessor  is  shown  in  Figure  16.  For  comparison,  the  task  block  diagram 
for  the  DSG  voice  processor  is  shown  in  Figure  17,  where  the  functions 
within  the  dashed  line  have  been  moved  to  the  terminal  processor  for  the 
WU  system  to  reduce  the  processing  necessary  for  each  voice  terminal, 
enabling  one  voice  processor  to  service  many  lower  cost  terminal  proces¬ 
sors. 

Almost  all  the  tasks  have  been  written  exclusively  in  Pascal. 
However,  a  few  selected  procedures  have  been  written  in  990  assembly 
language  as  needed  for  speed.  All  the  TX990  operating  system  was  writ¬ 
ten  in  990  assembly  language. 

1.  TASK  TO  COMMUNICATE  WITH  THE  TERMINAL  PROCESSOR 

The  communications  routines  at  both  the  voice  processor  and  the 
terminal  processor  ends  are  designed  to  accommodate  errors  in  transmis¬ 
sion  by  requiring  positive  acknowledgments  (ACKs)  to  be  received  by  the 
sender  before  discarding  any  message  that  has  been  sent.  Although  it  is 
doubtful  that  errors  will  ever  be  introduced  by  the  channel  itself  (at 
least  in  our  environment) ,  errors  can  occur  (missed  characters)  due  ei¬ 
ther  to  full  buffers  in  the  receiver  or  to  missed  interrupts  because 
other  higher  priority,  noninterruptable  processing  is  occurring  that 
lasts  longer  than  the  time  between  the  receipt  of  two  sequential  input 
characters  (approximately  every  1  ms) .  Any  time  such  an  error  occurs,  a 
retransmission  is  required,  which  increases  the  load  on  the  communica¬ 
tion  channel.  In  addition  to  the  loss  of  the  original  message,  the 
error  can  also  be  due  to  the  loss  of  the  return  ACK.  This  loss  of  ACKs 
means  that  the  sender,  after  an  appropriate  waiting  time,  must  spontane¬ 
ously  (and  needlessly)  retransmit  the  block  of  data  corresponding  to  the 
missing  ACK.  The  receiver,  of  course,  will  have  to  discard  the  extra 
message,  since  it  had  already  been  received.  The  object  then  is  to  re¬ 
duce  the  maximum  possible  delay  between  the  receipt  of  input  characters 
as  much  as  possible  in  order  to  decrease  the  frequency  of  characters 


31 


ABORT  ENROLLMENT 


Figure  16  Inter-Task  Communications  for  DSG’s  Voice  Processor 
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(and  hence  messages)  being  missed. 

The  task  concerned  with  communications  with  the  terminal  processor 
operates  using  this  error-recoverable  protocol.  Data  are  received  from 
the  terminal  processor  on  an  interrupt  driven  basis  with  each  byte  being 
placed  in  a  circular  input  queue  (200  bytes  long)  of  numbered  input  mes¬ 
sages,  by  an  input  device  service  routine  under  control  of  the  operating 
system.  After  an  entire  block  of  data  is  received  correctly,  an  ack¬ 
nowledgment  is  sent  to  the  terminal  processor.  If  messages  prior  to  a 
correctly  received  message  are  missing,  NAKs  are  also  sent  to  flag  the 
missing  messages.  As  time  permits,  another  device  service  routine  re¬ 
moves  data  from  the  input  queue,  checking  parity,  stripping  off  syn¬ 
chronization  and  "transparent"  characters,  and  blocking  the  data  for 
transmission  to  another  task  either  via  ITC  channels  or  via  a  common 
block,  in  the  case  of  preprocessed  speech  data.  One  exception  is  that 
"heartbeat"  messages  received  from  the  terminal  processor  are  acknowl¬ 
edged  back  to  the  terminal  processor,  but  are  not  passed  on. 

Data  are  also  received  from  other  voice  processor  tasks  via  ITC 
channels,  and  are  blocked  for  transmission  to  the  terminal  processor, 
with  appropriate  parity  and  synchronization  character  generation  and 
message  block  number  assignment.  After  this  processing,  the  data  are 
queued  for  transmission  to  the  terminal  processor  (along  with  ACKs  and 
NAKs  to  the  incoming  messages)  via  an  output  device  service  routine, 
also  operating  on  an  interrupt  driven  basis  under  control  of  the  operat¬ 
ing  system. 

In  summary,  the  terminal  processor  communications  task  provides  a 
transparant  interface  between  the  voice  processor  tasks  and  the  tasks 
that  have  been  moved  from  DSG's  voice  processor  to  the  terminal  proces¬ 
sor  in  the  wu  system.  The  software  at  the  terminal  processor  end  of 
this  communication  link  is  described  in  Part  C.2  of  this  section. 

2.  TERMINAL  TASK 

The  name  for  this  task  was  assigned  before  the  inception  of  the 
"terminal  processor"  in  the  WU  system  and,  regrettably,  imposes  addi¬ 
tional  confusion  on  the  description  of  the  software  system.  Rather  than 
having  any  direct  relation  to  the  terminal  processor,  this  task  controls 
the  state  of  the  voice  processor  and  its  control  of  the  terminal  proces¬ 
sor.  In  fact,  the  terminal  task  is  primarily  a  controller  for  a  large 
state  table,  driven  both  by  commands  from  the  host  processor  and  by  re¬ 
quests  and  responses  from  the  voice  processing  task  and  the  tasks  in  the 
terminal  processor. 

When  the  booth  is  empty,  the  terminal  task  is  idle,  except  for  res¬ 
ponding  to  occasional  heartbeat  messages  from  the  host.  (Terminal  pro¬ 
cessor  heartbeats  are  handled  entirely  in  the  communications  routine  and 
do  not  affect  the  terminal  task.)  When  a  door  to  the  booth  is  opened, 
the  terminal  task  starts  a  timer  which  'will  prompt  a  message  to  close 
the  door  if  the  door  is  held  open  too  long  (one  minute) ,  or  which  will 
disable  the  booth  after  three  such  messages  have  been  prompted.  If  the 
door  is  closed  without  any  weight  in  the  booth,  the  terminal  task  re- 


turns  to  its  idle  state.  However,  if  weight  is  present  after  the  door 
closes,  commands  to  lock  the  booth  doors  are  sent  to  the  terminal  pro¬ 
cessor  (at  least  one  booth  door  is  always  locked) ,  the  initial  timers 
are  reset,  and  additional  timers  are  initiated  to  check  that  some  user 
operation  is  requested  (by  the  input  of  a  valid  identification  number 
via  the  keyboard)  within  some  prescribed  time  interval  to  help  insure 
that  the  user  is  not  sabotaging  the  booth.  Once  a  valid  identification 
number  has  been  input,  the  number  is  received  by  the  terminal  task  and 
is  passed  on  to  the  host  for  determination  of  the  proper  action  to  be 
taken.  If  the  user  is  to  be  either  enrolled  or  verified,  the  request  is 
received  by  the  terminal  task,  causing  a  state  transition  in  its  state 
table,  and  the  request  is  passed  on  to  the  voice  processing  task  in  the 
voice  processor,  which  initiates  the  appropriate  prompting  and  process¬ 
ing  of  speech  input.  In  addition,  the  terminal  task  will  also  initiate 
still  more  timers  to  insure  that  the  user  responds  within  a  prescribed 
interval  and  that  the  entire  verification  or  enrollment  does  not  take  an 
abnormally  long  time. 

If  no  valid  input  is  received  from  the  keyboard  in  the  booth  within 
a  prescribed  time,  both  doors  remain  locked,  a  terminal  task  state  table 
transition  is  made,  the  user  is  prompted  to  "call  for  assistance,"  and 
exit  from  the  booth  is  possible  only  upon  receipt  of  an  operator  initi¬ 
ated  override  from  the  host  (security  control) . 

Once  a  user  has  been  verified,  the  terminal  task  relates  the  suc¬ 
cess  to  the  host,  undergoes  a  state  transition,  and  if  the  weight  rema¬ 
ining  in  the  booth  after  the  verified  user's  weight  is  subtracted 
exceeds  forty  pounds,  the  timers  waiting  for  keyboard  entry  are  initiat¬ 
ed  again  and  the  above  procedure  is  repeated  (with  appropriate  addition¬ 
al  state  transitions)  for  another  user  identification.  Also,  if  a  user 
is  not  verified,  the  above  procedure  is  repeated  (users  may  try  to  veri¬ 
fy  as  many  times  as  they  wish) .  Once  either  the  remaining  weight  is 
less  than  forty  pounds  (for  a  verification)  or  an  enrollment  has  been 
completed,  another  terminal  task  state  transition  occurs,  one  of  the 
doors  is  opened  (opposite  from  entry  for  verification;  same  as  entry 
for  enrollment)  and  the  door-open  timers  are  again  initiated. 

In  addition  to  these  monitors  during  normal  operation,  the  terminal 
task  will  also  notify  the  host  of  abnormal  conditions  existing  in  the 
booth,  such  as  both  doors  being  opened  at  the  same  time,  in  which  case 
portal  security  has  been  lost.  In  such  cases  of  monitor  abnormalities, 
the  terminal  task  will  send  a  message  to  the  host  which  will  be  printed 
on  the  alert  terminal  in  security  control  to  prompt  appropriate  action 
by  security. 

3.  VOICE  PROCESSING 

Another  unfortunate  semantic  confusion  exists  with  the  naming  of 
the  voice  processing  task,  which  is  one  of  the  tasks  running  on  the 
voice  processor.  Nevertheless,  the  function  of  the  voice  processing 
task  is  to  either  verify  or  enroll  a  single  user  by  command  from  the 
host,  as  passed  on  by  the  terminal  task.  Hence,  the  voice  processor 
task  sits  idle  until  a  command  is  received  via  an  ITC  channel  from  the 
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terminal  task. 

If  a  received  command  is  to  verify  a  user,  the  task  first  selects 
the  set  of  four  phrases  to  be  prompted  b^tsed  upon  both  a  random  number 
generator  and  the  frequency  of  usage  of  the  words,  as  stored  in  the 
reference  file  for  the  user  as  sent  from  the  host  processor.  (Readers 
desiring  more  details  on  the  algorithm  are  referred  Section  IV  and  Ap¬ 
pendix  I.)  The  voice  processor  then  transmits  a  message  to  the  terminal 
processor  to  prompt  the  first  selected  phrase,  to  collect  input  speech 
data  and  to  return  the  data  to  the  voice  processor.  The  communications 
task  in  the  voice  processor  relays  the  data  to  the  voice  processing  task 
via  a  common  block  area  of  memory  (to  avoid  extra  movement  of  this  large 
bulk  of  data)  as  it  is  received  from  the  terminal  processor,  and  if  the 
reference  data  for  all  of  the  prompted  words  have  arrived  from  the  host, 
the  comparison  between  the  input  and  reference  data  is  begun.  As  soon 
as  a  match  has  been  made  between  the  input  and  the  reference,  an  abort 
signal  is  sent  to  the  terminal  processor  to  stop  collecting  and  process¬ 
ing  data.  In  case  the  match  is  not  sufficiently  good,  and  the  algorithm 
requires  more  speech  data,  another  message  is  sent  to  the  terminal  pro¬ 
cessor  to  prompt  another  phrase,  and  this  process  is  repeated  for  up  to 
seven  phrases  total.  Whenever  a  final  decision  is  made  to  verify  or 
not-verify  the  speaker,  the  decision  is  sent  to  the  terminal  task,  which 
uses  the  decision  to  make  the  appropriate  state  transition  in  its  state 
table,  thus  prompting  further  action  (another  verification  attempt  or 
the  user  exiting  the  booth).  If  the  user  is  verified,  the  user's  refer¬ 
ence  data  are  updated,  and  this  updated  reference  file  is  sent  back  to 
the  host  for  storage  on  disc. 

Whenever  the  terminal  task  passes  the  identification  number  of  a 
user  that  is  not  enrolled  back  to  the  host,  if  the  personnel  and  securi¬ 
ty  information  for  that  number  have  already  been  properly  entered  and 
security  has  specifically  allowed  the  user's  enrollment  at  this  time, 
the  host  will  send  an  enrollment  command  to  the  terminal  task  in  the 
voice  processor,  which  in  turn  is  relayed  to  the  voice  processing  task. 
Upon  receipt  of  an  enrollment  command,  the  voice  processing  task  begins 
prompting  the  first  of  at  least  twenty  phrases  used  to  enroll  the  user. 
At  least  the  first  four  phrases  are  used  to  establish  initial  reference 
patterns,  which  are  then  used  to  compare  with  subsequent  input  data. 
The  largest  part  of  these  subsequent  comparisons  is  identical  with  the 
processing  used  during  a  verification,  and  hence  uses  the  same  proce¬ 
dures.  If  enrollment  is  completed  successfully,  the  generated  reference 
file  is  sent  to  the  host,  along  with  the  user's  identification  number 
and  weight  for  storage  on  disc.  If  enrollment  is  terminated  unsuccess¬ 
fully,  no  reference  data  are  returned,  but  a  message  is  sent  to  the  host 
telling  it  of  the  termination.  If  during  enrollment,  security  were  to 
determine  that  the  user  was  not  speaking  correctly  by  monitoring  his 
speech,  the  operator  could  abort  the  enrollment  from  the  command  termi¬ 
nal  on  the  host.  In  this  case,  a  message  is  received  at  the  voice  pro¬ 
cessing  task  directly  from  the  host  communications  task  (INTASK)  and  the 
enrollment  is  aborted.  The  terminal  task  does  not  know  (or  care)  wheth¬ 
er  the  abort  came  from  the  host  or  was  generated  internally  by  the  voice 
processing  task  due  to  inadequate  data.  The  list  of  reasons  why  enroll¬ 
ment  may  be  terminated  by  the  voice  processing  task  itself  is  given  in 
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After  enrollment  has  either  completed  or  terminated,  the  appropri¬ 
ate  messages  are  sent  to  the  terminal  processor  via  an  ITC  message  to 
the  terminal  communications  task,  which  will  inform  the  user  of  the  com¬ 
pletion  or  termination,  will  prompt  him  to  return  via  the  door  he  en¬ 
tered,  and  will  direct  the  booth  hardware  to  unlock  the  door. 

4.  HOST  COMMUNICATIONS  TASKS 

Communications  to  and  from  the  host  are  controlled  by  two  separate 
tasks:  INTASK  and  OUTTASK.  The  actual  device  drivers  in  the  operating 
system  are  almost  identical  to  those  used  for  communicating  with  the 
Terminal  Processor. 

During  its  time  slice,  INTASK  looks  for  an  input  message  to  be  pro¬ 
cessed.  If  none  exist,  the  task  returns  the  remainder  of  its  time  slice 
to  the  operating  system.  If  messages  exist,  INTASK  processes  them  until 
either  its  time  or  the  queue  of  incoming  messages  is  exhausted.  If  the 
incoming  message  is  protocol  (ACK,  NAK,  WAK) ,  it  is  passed  on  to  OUTTASK 
for  appropriate  action.  If  it  is  text,  INTASK  passes  it  on  to  the  ap¬ 
propriate  ITC  channel  to  be  sent  to  another  task,  and  passes  an  ACK  to 
OUTTASK  for  return  to  the  host.  If  the  ITC  channel  is  full,  a  WAK  is 
returned  to  the  host  through  OUTTASK.  If  either  INTASK's  input  queue  is 
full  or  a  transmission  error  occurs  for  a  recognizable  message,  a  NAK  is 
returned  to  the  host.  All  of  these  appropriate  actions  are  implemented 
with  a  state  table  similar  to  that  described  in  the  Terminal  Task  sec¬ 
tion. 

OUTTASK  is  responsible  for  messages  being  transmitted  from  the 
voice  processor  to  the  host.  The  procedure  GETMSG  places  messages  from 
ITC  channels  into  a  queue  of  areas  acquired  from  the  heap,  until  there 
are  no  more  messages,  the  maximum  number  (32)  of  messages  awaiting 
transmission  has  been  reached,  or  the  heap  has  been  exhausted. 
Initiation  of  the  next  message  to  be  transmitted  is  done  by  SNDMSG, 
which  also  preloads  counters  for  TIMMSG  to  decrement  for  determining 
when  to  spontaneously  retransmit  unacknowledged  messages.  After  three 
retransmissions,  unacknowledged  messages  are  discarded.  (Note  that  the 
error  recovery  capability  of  host-to-voice  processor  communications  is 
not  nearly  as  sophisticated  as  the  voice-to-terminal  processor  communi¬ 
cations,  which  never  discards  unacknowledged  messages.)  The  procedure 
GETFROM  uses  ACKs,  NAKs  and  WAKs  from  the  host  (via  an  ITC  from  INTASK) 
either  to  initiate  retransmissions  (for  NAKs  and  WAKs)  or  to  remove  mes¬ 
sages  from  the  queue  of  outgoing  messages  (for  ACKs) .  The  procedure 
GETFOR  queues  ACK,  NAK  and  WAK  messages  received  via  an  ITC  channel  from 
INTASK  for  transmission  to  the  host  in  response  to  messages  received  at 
the  voice  processor  from  the  host. 


C.  TERMINAL  PROCESSOR  SOFTWARE 

The  terminal  processor  consists  of  a  TI  990/5  minicomputer  with  as¬ 
sociated  hardware.  The  function  of  the  terminal  processor  is  to  perform 


local  processing  at  the  voice  booth,  which  consists  of  speech  synthesis, 
data  collection,  speech  preprocessing  and  portal  control.  The  terminal 
processor  communicates  with  the  rest  of  the  voice  authentication  system 
via  a  serial  communications  interface.  It  receives  instructions  from 
the  voice  processor  and  passes  speech  and  event  data  back  to  the  rema¬ 
inder  of  the  system. 

Terminal  processor  functions  are  performed  by  a  set  of  cooperating 
asynchronous  processes.  Each  basic  function  in  the  system  is  performed 
by  a  separate  process.  These  processes  communicate  with  one  another 
through  semaphores  and  buffers  in  shared  memory.  Each  process  is  allo¬ 
cated  a  portion  of  CPU  time  by  a  priority  scheduling  algorithm. 

1.  OPERATING  SYSTEM 

The  terminal  processor  runs  the  TIPMX  operating  system.  This  oper¬ 
ating  system  provides  the  kernel  functions  of  task  scheduling  and  syn¬ 
chronization.  TIPMX  is  described  in  detail  in  the  "TI  PASCAL  MICROPRO¬ 
CESSOR  EXECUTIVE  USER'S  GUIDE".  [13] 

2.  SPEECH  PROCESSING 

Several  speech  processing  tasks  are  performed  in  the  terminal  pro¬ 
cessor.  First,  the  terminal  processor  is  responsible  for  speech  syn¬ 
thesis  for  prompts  and  messages.  The  data  for  the  speech  synthesizer  is 
stored  locally  in  the  terminal  processor's  memory.  Synthesis  is  initi¬ 
ated  via  a  message  from  the  voice  processor  using  a  message  identifica¬ 
tion  code.  The  terminal  processor  uses  this  code  to  find  the  synthesis 
data,  then  causing  the  message  to  be  spoken. 

Speech  data  collection  is  initiated  by  a  prompt.  The  identity  of 
the  four  words  in  the  prompt  is  provided  in  the  request  from  the  voice 
processor.  The  terminal  processor  starts  the  speech  synthesizer  and  be¬ 
gins  collecting  data  (at  10  millisecond  intervals) .  Data  are  discarded 
until  twenty  samples  before  the  end  of  the  prompt,  at  which  time  "pre¬ 
processing"  of  the  data  begins.  Preprocessing  first  regresses  the  data 
using  sine/cosine  basis  functions.  The  data  are  then  normalized  and 
scaled,  and  the  result  is  blocked  for  transmission  to  the  voice  proces¬ 
sor.  After  each  group  of  8  samples  has  been  collected  and  processed, 
the  block  is  transmitted  to  the  voice  processor  over  the  serial  communi¬ 
cations  link. 

When  sufficient  data  have  been  collected,  speech  data  collection 
and  preprocessing  can  be  stopped  by  a  message  from  the  voice  processor. 
This  is  the  normal  way  to  stop  data  collection.  Otherwise,  the  terminal 
processor  will  stop  data  collection  after  400  samples  have  been  collect¬ 
ed  (4  seconds) .  At  the  end  of  data  collection,  an  end-of-speech  message 
is  sent  to  the  voice  processor. 

3.  COMMUNICATIONS 

The  communications  portion  of  the  terminal  processor  software  is 
responsible  for  receiving  instructions  from  the  voice  processor  and 


transmitting  data  to  it.  The  communication  software  is  capable  of  oper¬ 
ating  with  an  imperfect  link  -  that  is,  error  detection  and  recovery  are 
performed.  All  messages  have  a  two  byte  checksum  and  a  sequence  number. 
Errors  within  a  message  are  detected  with  the  checksum,  and  lost  mes¬ 
sages  are  detected  when  sequence  numbers  are  missing.  All  messages  are 
acknowledged  by  the  receiver,  and  the  message  data  are  retained  by  the 
sender  until  the  message  has  been  acknowledged.  Thus,  if  a  message  has 
an  error  or  is  lost,  it  can  be  retransmitted. 

The  communications  processing  in  the  terminal  processor  operates 
through  several  queues,  which  are  first-in,  first-out  buffers  implement¬ 
ed  with  linked  lists.  Each  queue  in  the  system  has  a  specific  function 
and  contains  messages  that  either  have  been  received  or  that  are  being 
sent. 


Data  being  sent  to  the  voice  processor  are  placed  on  a  queue.  When 
the  data  reaches  the  head  of  the  queue,  they  are  assigned  a  sequence 
number  and  transmitted.  The  block  of  data  is  then  moved  to  another 
queue  to  await  acknowledgment.  If  the  data  are  not  received  correctly, 
they  are  retransmitted.  When  the  data  have  been  successfully  received 
by  the  voice  processor  and  acknowledged,  the  buffer  containing  the  data 
is  removed  from  the  acknowledgment  queue  and  made  available  for  more 
data. 

Data  received  from  the  voice  processor  are  also  queued.  If  the 
message  is  correctly  received,  it  is  queued  until  all  lower  sequence 
numbers  are  received.  This  assures  that  the  data  are  processed  in  the 
same  order  that  it  is  sent.  When  all  lower  numbered  messages  are  ac¬ 
counted  for,  the  message  is  placed  on  a  queue  based  on  its  opcode  and  is 
removed  by  the  task  that  processes  that  type  of  message. 

A  message  begins  with  a  header  that  controls  routing  and  error  de¬ 
tection.  The  header  consists  of  four,  one-byte  fields.  The  first  byte 
specifies  the  type  or  function  of  the  message.  A  message  may  be  data, 
an  acknowledgment  of  a  previously  transmitted  message,  a  retransmission 
of  lost  data,  or  a  request  for  retransmission.  Only  data  messages  are 
acknowledged.  The  second  byte  contains  the  sequence  number  of  the  mes¬ 
sage,  the  value  of  which  ranges  from  0  to  127. 

The  third  byte  of  the  header  contains  the  length  of  the  message, 
including  the  header.  The  last  byte  of  the  header  is  the  message  op¬ 
code.  This  opcode  specifies  the  function  of  the  message  and  is  used  to 
determine  which  task  within  the  terminal  processor  is  to  receive  it. 
Each  task  in  the  terminal  processor  looks  in  a  queue  for  incoming  mes¬ 
sages.  The  opcode  byte  in  the  message  header  is  used  to  determine  the 
queue  in  which  to  place  it. 

Each  message  ends  with  a  two  byte  checksum.  Each  byte  is  an  inde¬ 
pendent  checksum  using  a  different  algorithm,  providing  extra  error  de¬ 
tection  capability.  When  a  message  has  been  successfully  received,  an 
acknowledgment  is  immediately  queued  to  be  sent,  and  the  message  is 
placed  on  a  queue  to  be  processed. 
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During  initial  debugging  of  the  system,  ACKs  from  the  voice  proces¬ 
sor  to  the  terminal  processor  were  being  lost,  due  to  high  priority 
speech  processing.  The  device  service  routine  (DSR)  that  collects 
speech  data  requires  more  than  1  millisecond  to  collect  data  from  all  14 
filters.  This  caused  the  communications  system  to  miss  characters  occa¬ 
sionally.  It  was  decided  to  reassign  the  interrupt  priority  levels  in 
the  terminal  processor  to  allow  processing  of  a  communications  input 
character  to  interrupt  the  reading  of  the  filter  bank.  This  causes  an 
occasional,  approximately  0.3  ms  skew  between  two  parts  of  the  spectrum, 
which  is  sampled  once  every  10  ms.  This  skew  is  insignificant,  and  the 
alteration  in  priority  greatly  improved  communication  throughput  by  re¬ 
ducing  errors  and  hence  retransmission  of  data. 


Communications : 
Speech  input: 
Clock : 

Portal/scale: 
733  terminal: 


level  3  (highest) 
level  4 
level  5 
level  9 

level  12  (lowest) 


Since  the  communications  interrupt  now  has  the  highest  priority, 
the  communications  driver  in  the  terminal  processor  was  partitioned  to 
reduce  the  maximum  elapsed  time  between  the  checking  for  other  inter¬ 
rupts  in  order  to  minimize  the  delay  caused  by  the  interrupt  processing. 


An  additional  enhancement  to  the  voice/terminal  processor  communi¬ 
cations  was  the  addition  of  an  "inquire"  type  of  message  from  the  termi¬ 
nal  to  voice  processor.  Since  the  speech  data  blocks  are  quite  long  (88 
bytes:  header  information,  a  two  byte  check  sum,  and  8  samples  [80  ms] 
of  speech  data) ,  spontaneous  retries  of  unacknowledged  data  from  the 
terminal  processor  significantly  increased  the  traffic  on  the  communica¬ 
tion  line.  To  avoid  this,  timeouts  of  unacknowledged  messages  cause  the 
transmission  of  an  "inquire"  message  of  only  six  bytes.  Receipt  of  this 
message  at  the  voice  processor  will  cause  either  an  ACK  or  a  NAK  to  be 
returned,  with  the  terminal  then  taking  the  usual  appropriate  action. 

4 .  PORTAL  CONTROL 

The  portal  control  task  is  responsible  for  control  of  the  booth. 
It  monitors  the  door  sensors,  controls  the  locks  and  reads  the  weight 
scale.  When  there  is  a  significant  change  in  the  state  of  the  booth,  a 
message  is  sent  to  the  voice  processor.  For  example,  the  detection  of 
weight  on  the  scale  causes  a  message  to  be  sent  to  the  voice  processor, 
which  then  requests  that  the  "HELLO"  message  be  spoken. 


The  portal  control  task  is  implemented  as  a  finite  state  machine. 
The  state  transition  tables  are  stored  in  the  terminal  processor.  These 
tables  determine  the  valid  states  and  responses  of  the  system.  For  ex¬ 
ample,  the  entry  door  is  opened  by  a  sequence  that  requires  that  the  ac¬ 
cess  door  be  locked  before  the  entry  door  can  be  unlocked.  This  assures 
that  the  booth  is  always  in  a  valid  state  and  improves  security. 


5.  LOCAL  TERMINAL  INTERFACE 

The  terminal  processor  contains  a  driver  for  a  local  terminal. 
This  terminal  is  connected  to  an  interface  in  the  990/5  and  is  used  by 
the  terminal  processor  to  print  statistics  for  performance  measurement. 

The  keyboard  on  the  local  terminal  is  periodically  examined  by  the 
terminal  processor  to  accept  commands  to  print  debugging  and  performance 
information.  The  system  can  print  lists  of  the  currently  running  tasks 
and  their  states;  communications  statistics,  including  the  number  of 
retries;  and  speech  statistics.  Software  switches  are  present  that  can 
be  toggled  through  the  keyboard  to  cause  dynamic  performance  information 
to  be  printed.  This  performance  information  includes  the  average  time 
to  verify  and  the  amount  of  data  collected  and  processed  for  each 
phrase. 

D.  DOWNLOADING  THE  VOICE  PROCESSOR  AND  THE  TERMINAL  PROCESSOR 

The  loaders  are  used  to  initially  load  the  voice  processor  and  ter¬ 
minal  processor.  All  code  for  the  system  is  stored  on  disk  files  on  the 
host  and  is  downloaded  under  operator  control.  Since  a  direct  memory 
access  channel  exists  between  the  host  and  the  voice  processor,  this 
memory  port  is  used  to  load  the  voice  processor.  The  host  reads  the  ob¬ 
ject  code  from  disk,  performs  relocation  and  loads  the  binary  object 
code  directly  into  the  voice  processor's  memory. 

The  command  to  load  the  voice  processor  is:  LOADVP.  The  operator 
provides  the  name  of  the  file  on  the  host  containing  the  object  code  for 
the  voice  processor .  This  code  is  then  loaded  into  the  voice 
processor's  memory. 

Since  the  only  connection  between  the  terminal  processor  and  the 
host  is  through  the  voice  processor,  the  terminal  processor  is  loaded  by 
the  voice  processor.  The  data  to  load  are  obtained  from  a  file  on  the 
host,  passed  to  the  voice  processor,  and  then  passed  on  to  the  terminal 
processor . 

When  the  LOADTP  command  is  given  by  the  operator,  the  host  first 
loads  the  voice  processor  with  a  loader  program  for  the  terminal  proces¬ 
sor.  It  is  then  necessary  for  the  operator  to  start  this  program  exe¬ 
cuting  via  the  front  panel  on  the  voice  processor.  The  host  provides 
the  data  across  the  TILINE*  coupler  to  memory  in  the  voice  processor. 
The  loader  program  in  the  voice  processor  adds  checksum  and  header  in¬ 
formation  and  transmits  the  data  to  the  terminal  processor  via  the  seri¬ 
al  communications  interface.  The  positive  acknowledgment  required  from 
the  terminal  processor  and  the  two  bytes  of  checksum  sent  with  each 
block  of  data  assure  that  the  information  has  been  correctly  received. 
When  the  data  are  acknowledged,  the  voice  processor  acknowledges  it  to 
the  host  by  setting  a  flag  in  memory,  and  the  host  provides  another 
block  of  data  to  the  voice  processor. 
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The  terminal  processor  has  a  loader  present  in  read  only  memory. 
This  loader  was  modified  from  the  normal  front  panel  loader  as  part  of 
this  project.  The  loader  communicates  through  the  serial  communications 
interface  on  the  terminal  processor.  Data  is  loaded  in  blocks  with  each 
block  containing  sequence  number  information  and  checksums.  As  each 
block  is  received,  it  is  checked  for  validity.  Data  blocks  are  acknowl- 
eged  by  returning  a  message  to  the  voice  processor.  This  allows  reli¬ 
able  loading  of  the  terminal  processor  across  an  imperfect  communica¬ 
tions  link.  Loading  of  the  terminal  processor  is  initiated  by  pressing 
the  halt  and  load  buttons  either  on  the  front  panel  of  the  terminal  pro¬ 
cessor  or  the  buttons  located  remotely  (at  the  host)  and  attached  in 
parallel  with  the  terminal  processors  front  panel. 

Once  the  terminal  processor  is  loaded,  the  loader  in  the  voice  pro¬ 
cessor  halts,  and  the  host  loads  the  voice  processor's  memory.  The 
voice  processor  must  then  be  started  from  its  front  panel,  which  com¬ 
pletes  the  loading  process. 
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SECTION  IV 


BISS  VOICE  AUTHENTICATION  ALGORITHM  MODIFICATIONS 


In  order  to  provide  a  baseline  for  discussion  of  the  modifications 
made  to  the  algorithm  delivered  on  the  BISS-ASV-ADM ,  the  algorithm  as 
defined  in  the  "Segment  Specification  for  Entry  Control  System"  [11]  has 
been  reproduced  as  Appendix  III  of  this  final  report.  Rather  than  giv¬ 
ing  line-by-line  modifications  to  Appendix  III,  narrative  descriptions 
of  the  changes  are  given  in  this  section.  A  brief  summary  of  the 
differences  among  the  BISS  system,  the  980-based  CIC  system,  and  the  DSG 
and  WU  systems  is  given  in  Tables  5  through  8. 

A.  WORD  SET 

The  set  of  prompting  words  used  for  the  old  980-based  CIC  system, 
for  the  laboratory  test  data  set,  for  the  DSG  system,  and  for  the  WU 
system  is  shown  below.  It  is  different  from  that  given  in  Appendix  III 
for  the  BISS  system. 


GOOD 

BEN 

SWAM 

NEAR 

PROUD 

BRUCE 

CALLED 

HARD 

STRONG 

JEAN 

SERVED 

HIGH 

YOUNG 

JOYCE 

CAME 

NORTH 

While  the  words  are  still  selected  randomly  during  verification,  the 
prompting  order  during  enrollment  has  been  fixed  for  both  the  DSG  and 
WU  systems.  Since  the  need  for  randomization  (to  prevent  the  use  of 
tape  recorders  by  impostors)  is  not  applicable,  fixing  the  phrase  order 
during  enrollment  allows  the  enrollee  to  use  a  printed  list  for  assis¬ 
tance  in  understanding  the  phrases,  which  not  only  are  new  to  the  user, 
but  which  have  degraded  in  quality  from  the  old  BISS  system  due  to  the 
need  to  use  LPC  coded  prompts,  as  explained  earlier.  The  prompting 
order  for  the  phrases  during  enrollment  is  given  in  Table  9. 


ASV  System  Comparisons 
Overall 


BISS-ADM 

980B 

Modification  of  RTM-I 


Computer: 

Operating  System: 

Prog.  Languages: 
Entry  Control: 

Training  Mode 

(Mini-Enrollment) 


Fortran,  980  Assem 

No  Booth 

Checks: 

•  None 

No 


C1C 

980A 

RTM-I 

Fortran,  980  Assem 

Booth  with  Scale 

•  Multiple  Entrants 
Checks: 

•  Security  Level 

No 


DSG  (WtO 

990/ 10,/ 10 
(990/10, /12, 15) 

DX-10,  TX-990 
(DX-10,  TX-990,  TIPMX) 

PASCAL,  990  Assem 

Booth  with  Scale 

•  Multiple  Entrants 
Checks: 

•  Security  Level 

•  Time  of  Day 

•  Inventory 

No 


Validation  Mode  No  No  No 

(Verify;  No  Statistics) 

Statistics/Report  None  Some  Extensive 


Generation 


Table  6 

ASV  System  Comparisons 
Speech  Input 


BISS-ADM 

CIC 

DSG  (VVU) 

Prompting  Set: 

"North  Lawn 

"Good  Bruce 

"Good  Bruce 

Great  Camp" 

Called  Hard” 

Called  Hard" 

Filters: 

Digital 

Digital 

Analog 

•  Prog.  Gain  Control 

•  No  Prog.  Gain  Control 

•  No  Gain  Control 

•  Overload  Flag 

•  No  Overload  Flag 

•  No  Overload  Flag 

•  Constant  CF  Spacing 

•  Constant  CF  Spacing 

•  CF  Spaclngs 

Increase  with 

Frequency 

•  Constant  BWs 

•  Constant  BWs 

•  BWs  Increase 

with  Frequency 

•  16  Flits  (Discard  15, 

16)  •  16  Flits  (Discard  15, 

16)  •  14  Filters 

Preprocessing: 

•  Regression 

Vectors: 

Legendre  Polys. 

Both  BISS  and 

SIN/COS 

•  Regression 

DSG-Type 

Yes 

Limiting: 

No 

Preprocessings 

Max  of  Flits  (2-13) 

•  Energy: 

Max  of  14  Filters 

Used;  Energy  Calc. 

Ave  Flits  (1-14) 

•  Normalization:  By  the  Mean 

only  DSG  Type 

by  the  St.  Dev. 

•  Quantization 

Thresholds:  All  Male  Design  Set 

Discard  Lowest  4  Filters 


Male/Female  Design  Set 
Filters  Selected  by  Vowel 


Table  7 

ASV  System  Comparisons 
Enrollment 


BISS-ADM 

CIC 

DSG  (VVU) 

Phrase  Prompting 

Order: 

Random 

Random 

Fixed 

Auto  Termination: 

No;  Operator 

Yes;  Variety  of 

Yes;  Variety  of 

Controlled 

Criteria 

Criteria 

Max  Acceptable  e: 

oo 

160  (BISS) 

200  (DSG) 

200 

Reference  File 

3  Int. 

3  Int. 

3  Int. 

Precision  per  Element  5  Frac. 

(Frac.  Bits  for 

Updating) 

3  Frac. 

4  Frac. 

TABLE  9.  PHRASE  PROMPTING  ORDER  DURING  ENROLLMENT 


1.  PROUD  BEN  SWAM  NEAR 

2.  GOOD  BRUCE  CALLED  HARD 

3.  STRONG  JOYCE  CAME  NORTH 

4.  YOUNG  JEAN  SERVED  HIGH 

5.  STRONG  JOYCE  CALLED  HARD 

6.  GOOD  BRUCE  CAME  NORTH 

7.  YOUNG  JEAN  SWAM  NEAR 

8.  PROUD  BEN  SERVED  HIGH 

9.  YOUNG  JOYCE  CAME  NORTH 

10.  GOOD  BEN  SWAM  NEAR 

11.  STRONG  JEAN  SERVED  HIGH 

12.  PROUD  BRUCE  CALLED  HARD 

13.  STRONG  JEAN  SWAM  NEAR 

14.  YOUNG  JOYCE  CALLED  HARD 

15.  PROUD  BRUCE  CAME  NORTH 

16.  GOOD  BEN  SERVED  HIGH 


17.  STRONG  JOYCE  CAME  NEAR 

18.  GOOD  BRUCE  CALLED  HIGH 

19.  YOUNG  JEAN  SERVED  HARD 

20.  PROUD  BEN  SWAM  NORTH 

21.  GOOD  BRUCE  CAME  NEAR 

22.  PROUD  BEN  SERVED  HARD 

23.  YOUNG  JEAN  SWAM  NORTH 

24. '  STRONG  JOYCE  CALLED  HIGH 

25.  GOOD  BEN  SWAM  NORTH 

26.  YOUNG  JOYCE  CAME  NEAR 

27.  STRONG  JEAN  SERVED  HARD 

28.  PROUD  BRUCE  CALLED  HIGH 

29.  STRONG  JEAN  SWAM  NORTH 

30.  GOOD  BEN  SERVED  HARD 

31.  PROUD  BRUCE  CAME  NEAR 

32.  YOUNG  JOYCE  CALLED  HIGH 


B.  SPEECH  PREPROCESSING 

Except  for  the  LPC-based  experiments  described  in  Section  VII,  all 
the  speech  processing  algoritms  described  in  this  report  are  based  on 
the  relative  spectrum  of  speech  as  a  function  of  time,  as  represented  by 
the  energies  out  of  a  bank  of  bandpass  filters,  sampled  every  10  ms, 
which  has  been  preprocessed  as  described  in  this  subsection.  In  order 
to  simplify  the  notation,  the  time  index  (j)  has  been  eliminated  from 
the  equations  in  this  section. 


1.  FILTER  BANK  DEFINITIONS 

This  section  will  provide  a  superficial  distinction  between  some  of 
the  filter  banks  used  in  recent  speaker  verification  work  at  Texas  In¬ 
struments. 

The  first  column  in  Table  10  contains  the  design  center  frequencies 
(165.3  Hz  spacing)  for  the  two-stage  (two  poles  per  stage)  digital 
filters  used  in  each  of  the  two  filter  banks  delivered  with  the  BISS 
ADM.  For  the  BISS  ADM  the  center  frequencies  of  each  of  the  two  stages 
were  slightly  offset  to  produce  a  rather  flat-topped  amplitude  response. 
The  sample  period  for  the  actual  filters  delivered,  however,  was  106 
instead  of  100  microseconds,  resulting  in  center  frequencies  about  6% 
lower  than  those  in  the  design.  These  lower  center  frequencies  were  ac¬ 
tually  used  in  all  of  the  MITRE  testing  and  still  reside  in  the  ADM. 

For  the  Total  Voice  contract,  the  center  frequencies  and  bandwidths 
of  the  filters  in  one  of  the  filter  banks  (Channel  #2)  in  the  BISS  ADM 
were  changed  by  replacing  the  PROMs  which  contained  the  filter  coeffi¬ 
cients.  The  Total  Voice  filter  coefficients  were  defined  such  that  the 
center  frequencies  of  each  of  the  two  stages  were  the  same  for  each  of 


the  filters.  This  yielded  round-shouldered  filter  amplitude  responses. 
The  same  106  vs  100  microsecond  sample  period  disparity  remained, 
however,  yielding  the  measured  values  shown  in  Table  10. 

The  DSG  filter  banks  (also  used  on  the  WU  system)  were  an  out¬ 
growth  of  the  Total  Voice  filters.  The  first  step  was  to  define  a 
filter  bank  having  a  set  of  iteratively  defined  center  frequencies  and 
bandwidths.  After  this  was  done,  experiments  to  be  described  in  the 
next  section  showed  that  speaker  verification  performance  improved  as 
the  bandwidths  of  the  filters  decreased.  The  result  was  the  14-channel 
DSG  filter  bank,  as  defined  by  the  first  fourteen  entries  in  the  final 
column  of  Table  10.  The  iterative  definition  of  these  parameters  is 
given  by 


1/13 


BW  =  BW  (2) 
n+1  n 


CF  =  CF  +  (CF  -  CF  )  (2) 

n+1  n  n  n-1 


1/6 


where,  BW  =  200,  CF  =  350,  and  CF  =  350  -  100/(2) 
11  0 

TABLE  10.  FILTER  BANK  DEFINITIONS 


1/6 


BISS  ADM  TOTAL  VOICE 

DESIGNED  DESIGNED  MEASURED  DSG/WU 


FILTER  # 

CF 

BW 

CF 

BW 

CF 

BW 

CF 

BW 

1 

410 

290 

350 

30  0 

280 

250 

350 

200 

2 

57  5 

290 

450 

300 

395 

280 

450 

211 

3 

741 

290 

555 

310 

525 

310 

562 

223 

4 

906 

290 

670 

340 

630 

340 

688 

235 

5 

1071 

290 

790 

380 

7  50 

36  0 

830 

248 

6 

1237 

290 

940 

400 

900 

36  0 

988 

261 

7 

1402 

290 

1120 

400 

1080 

36  0 

1167 

275 

8 

1567 

290 

1320 

400 

1265 

365 

1367 

290 

9 

1733 

306 

1550 

400 

1480 

365 

1591 

306 

10 

1898 

290 

1810 

400 

1725 

365 

1843 

32  3 

11 

2063 

290 

2100 

400 

198  5 

365 

2126 

341 

12 

2229 

290 

2430 

400 

2285 

36  0 

2443 

36  0 

13 

2394 

290 

2800 

400 

2640 

365 

2800 

379 

14 

2559 

290 

3350 

700 

3150 

625 

3200 

400 

15 

*2725 

290 

4000 

700 

3720 

635 

**3649 

422 

16 

*2890 

290 

4650 

700 

4235 

615 

***4153 

445 

*  Not  used  in  ADM,  although  implemented  in  actual  filters 
**  As  defined  for  CCD  analyzer;  does  not  exist  in  DSG  filter  board. 
***  Cut-off  freq  for  high  pass  filter  as  defined  for  CCD  analyzer; 
does  not  exist  on  DSG  board. 


49 
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REGRESSION 


It  has  been  found  that  by  eliminating  the  gross  aspects  of  the 
spectrum,  such  as  the  slope  and  curvature,  more  clearly  defined  formant 
frequencies  are  obtained.  This  can  be  accomplished  by  regression.  The 
BISS-ADM  utilized  the  first  two  Legendre  polynomials  for  regression; 
however,  the  DSG  and  wu  systems  regressed  the  spectral  amplitude  vector 
using  the  first  three  elements  of  an  orthonormal  sin/cosine  basis  set, 
defined  as  follows: 


r  (i) 
0 


v/l4 


1 

r  (i)  =  - cosine 

2 


(i-0.5) 

r  (i)  =  0.559065  -  0.8763356  sin  - 7 T 

1  14 

The  regression  coefficients  are  then  defined  as 

14 

C  =  )  a (i )  *  r  (i)  k  =  {0,1,2} 

k  k 

i=l 

where  a(i)  is  the  magnitude  of  the  output  of  filter  (i).  [The  BISS 
specification  in  Appendix  III  denotes  this  as  f(i).]  The  regressed  out¬ 
put  (denoted  g(i)  in  Appendix  III)  is  then  determined  from 

a  (i)  =  a  (i )  -  [C  *  r  (i)  +  C  *  r  (i)  +  C  *  r  (i )  ] 

R  0  0  112  2 


In  the  actual  implementation,  however,  scaling  such  as  described  in  sec¬ 
tion  10.2.2.2.1.1  of  Appendix  III  is  done  to  preserve  precision  in  the 
calculation.  This  is  accomplished  by  redefining  C  and  c  as 

1  2 


14 

14 

a (i)  *  phi  (i) 

_ L_ 

vE 

a ( i )  *  phi  (i) 
2 

i=l  32768 

i  =  2 

32768 
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and 


C'  *  gamma  (i)  C'  *  gamma  (i) 
11  2  2 

a  (i)  =  a  (i)  -  C  *  r  (i)  -  -  -  - 

R  00  32768  32768 

where 

32768  *  32768  32768  *  32768 


phi  (l)  = 

— 

- w  r 

(i) 

pni  (i)  =  - 

- «  r 

1 

0.5 

*  30211  1 

2  0.5 

*  24615  2 

gamma  (i) 

=  0. 

5  *  30211  *  r 

(i) 

gamma  (i )  =0.5 

*  24615  *  r  (i ) 

1 

1 

2 

2 

The  phis  and  gammas  are  given 

in  Table 

11 

,  which  corresponds  functional- 

ly  to  Table 

VII 

in  Appendix  III. 

TABLE  11 

.  REGRESSION  VECTORS 

i 

Phi  (i) 

phi  (i) 

gamma  (i) 

gamma  (i) 

1 

2 

1 

2 

1 

32767 

-32767 

6963 

-46  22 

2 

19168 

-31125 

407  3 

-4390 

3 

6603 

-27920 

1403 

-3938 

4 

-4304 

-23316 

-914 

-3288 

5 

-13000 

-17543 

-276  2 

-2474 

6 

-19056 

-10891 

-4049 

-1536 

7 

-22158 

-3692 

-4708 

-520 

8 

-22158 

3692 

-4708 

521 

9 

-19056 

10892 

-4049 

1537 

10 

-13000 

17543 

-2762 

2475 

11 

-4304 

23316 

-914 

3289 

12 

6603 

27920 

1403 

3939 

13 

19168 

31125 

407  3 

4391 

14 

32767 

32767 

6963 

4623 

Thus  the  regression  tends 

to  flatten 

the  spectrum,  removing  any 

half-cycle 

sine 

or  cosine  wave  trends 

of 

the  spectrum. 

An  example  of  a 

spectral  waveform  having  a  large  positive 

C  is  a  nasal 
1 

,  which  has  one 

peak  near  the  low  end  and  one 

near  the 

high  end  of  the 

spectrum  (around 

250  Hz  and 

2200 

Hz)  .  An  example  of  a 

spectral  waveform 

with  a  large  po- 

si tive  C 

is  a 

sibilant,  having  most 

of 

its  energy  above  3000  Hz.  Most 

vowels,  however, 

have  the  opposite  spectral  tilt  because  of  the  glottal 

source  spectral 

decay  with  increasing 

frequency,  yielding  a  large  nega- 

tive  value  of  C 

2 
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It  has  been  observed,  however,  that  regression  sometimes  eliminates 
too  much  of  the  variance  of  the  spectrum.  Hence,  it  was  decided  that  if 
the  postregression  variance  were  less  than  some  fraction  of  the  prere¬ 
gression  variance,  the  magnitudes  of  the  regression  coefficients  would 
be  reduced  sufficiently  to  limit  the  variance  after  regression  to  that 
fraction  of  the  preregression  variance.  Stated  another  way,  the  RMS 
value  of  the  amount  regressed  out  is  limited  to  some  (other)  fraction  of 
the  preregression  variance.  If  in  fact  the  regression  coefficients  are 
too  large,  they  are  reduced  as  follows: 


C  =  C  * 
1R  1 


K  *  (pre-regression  variance) 
2  2 

sqrt(  C  +  C  ) 

1  2 


where 


pre-regression  variance  =  sqrt(  >  a(i)  -  C  ) 


13  2 

K  =  sart(  1  -  —  *  ratio  ) 
11  min 


ratio 


is  the  minimum  allowable  ratio  of  postregression  variance 


to  preregression  variance.  In  actually  implementing  the  regression 
limiting,  intermediate  scalings  were  performed  using  c'  and  C'  to  pre- 

1  2 

serve  as  much  precision  as  possible.  Although  the  need  to  limit  regres¬ 
sion  may  exist  for  regression  using  any  set  of  basis  vectors,  such  lim¬ 
iting  was  done  only  when  sine/cosine  basis  functions  were  used. 
Regression  limiting  was  not  done  for  the  original  BISS  algorithm,  which 
used  Legendre  polynomial  basis  functions. 

3.  NORMALIZATION 

The  next  step  in  the  preprocessing  is  to  normalize  the  regressed 
values  of  the  filter  outputs  to  accommodate  the  wide  variety  of  vocal 
efforts  among  speakers.  In  the  original  BISS  algorithm,  normalization 
of  each  spectral  section  was  done  by  dividing  by  the  mean  of  all  the 
filter  outputs  in  the  spectral  section.  However,  when  the  regression 
was  done  using  the  sine/cosine  basis  functions,  normalization  was  done 
by  dividing  by  the  postregression  standard  deviation,  given  by 


postregression  variance  = 


■  fe 


—  I  >  a  (i )  -  C  -  C  -  C 
11  0  1R  2R 

'  i  =  l 


plus  a  constant  that  was  added  to  avoid  large  normalized  amplitudes 
during  times  of  very  small  standard  deviations  (i.e.,  during  silence 
intervals).  Whenever  regression  limiting  occurred,  the  postregression 
standard  deviation  became  the  ratio  • (discussed  in  the  prior  sub- 

mi  n 

section)  times  the  preregression  standard  deviation. 


4.  QUANTIZATION 

The  regressed,  normalized  filter  (i)  energies  for  a  spectral  section 
are  next  quantized  to  one  of  eight  levels  according  to  a  set  of  quanti¬ 
zation  thresholds,  <J>  : 

i»q 

0  for  a (i)  <_  <t> 

N  i ,  0 

a(i)  =  q  for  0  <a(i)  <_  Q  for  q  =  fl,2,...6} 

Q  i,q  N  i,q+l 

7  for  0  <  a(i) 

i ,  7  N 


The  selection  of  the  thresholds  used  is  clearly  a  function  of  the 
processing  that  precedes  this  quantization  step,  i.e.  the  center  fre¬ 
quencies  of  the  filter  bank,  the  regression,  and  the  normalization. 
Hence,  for  each  change  in  any  of  these  steps,  the  quantization  levels 
must  be  recalculated. 

In  originally  determining  the  quantization  thresholds  used  for 
BISS,  a  special  data  set  was  used,  consisting  of  one  repetition  by  each 
of  21  males  of  the  set  of  words  given  in  Table  12. 

TABLE  12.  WORD  SET  #1  USED  IN  DETERMINING  QUANTIZATION  THRESHOLDS 

Bid  Bed  Bored 

Bead  Bade  Bode 

Bide  Booed  Bud 

However,  since  this  set  of  words  did  not  include  all  of  the  vowels,  and 
since  there  were  no  females  in  the  data  set  (a  possible  reason  for  poor¬ 
er  performance  for  females  in  prior  speaker  verification  experiments),  a 
new  set  was  collected  consisting  of  one  repetition  by  each  of  11  males 
and  11  females  of  the  set  of  words  given  in  Table  13. 
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TABLE  13.  WORD  SET  #2  USED  IN  DETERMINING  QUANTIZATION  THRESHOLDS 

*Pot  *Bert  *Bet  *Bought 

♦Put  Bout  Bait  *Beet 

Boyd  *Bat  Boat  *But 

Butte  Bite  *Bit  *Boot 

The  quantization  thresholds  were  then  chosen  by  plotting  histograms 
for  each  of  the  regressed,  normalized  filter  outputs  fa(i)'s)  during  the 

N 

vowel  portions  of  the  words.  Rather  than  having  these  quantization  lev¬ 
els  chosen  to  yield  a  uniform  probability,  it  seemed  more  desirable  to 
cluster  the  quantization  thresholds  at  higher  energy  levels.  In  this 
way  the  sensitivity  to  noise  can  be  reduced  and  the  quantization  resolu¬ 
tion  increased.  Two  schemes  were  used  to  exclude  the  low-energy 
filters.  One  was  simply  to  exclude  the  four  lowest  energy  filters  at 
each  time  sample  during  the  vowel.  (This  was  the  scheme  used  to  deter¬ 
mine  thresholds  for  the  original  BISS  algorithm.)  Another  method  was  to 
more  selectively  include  only  energies  from  certain  filters  on  the  basis 
of  the  vowel's  identity.  In  this  second  method,  since  the  identities  of 
the  high  energy  filters  are  a  function  of  the  center  frequencies  of  the 
particular  filter  bank  used,  the  high  amplitude  filters  were  determined 
from  visual  inspection  of  spectrograms.  The  resulting  choices  for  high 
amplitude  filters  are  given  in  Tables  14  and  15  for  the  BISS  and  for  the 
WU  filter  banks,  respectively. 

In  an  effort  to  compare  both  the  two  methods  of  generating  quanti¬ 
zation  thresholds  and  the  two  data  sets,  the  words  with  common  vowels 
from  both  data  sets  were  processed  using  the  BISS  filter  simulation  and 

IPM0D2  preprocessing.  Figures  18A  and  18B  compare  the  old  and  the  new 

data  sets  using  only  the  male  speakers  and  the  ten  highest  amplitude 
filters  for  each  word.  Figures  18A  and  18B  compare  the  resulting  thres¬ 
holds  for  the  new  data  set  using  the  ten  highest  amplitude  filters  for 
males  alone  (18A)  to  those  for  both  males  and  females  (18B) .  Figures 

18B  and  18D  compare  (for  all  speakers  in  the  new  data  set)  using  the  ten 

highest  amplitude  filters  to  using  selected  high  amplitude  filters,  as 
given  in  Table  14. 

In  all  subsequent  experiments  discussed  in  Sections  V  and  VI,  the 
new  data  set,  with  both  male  and  female  speakers,  was  used  to  determine 
quantization  thresholds  using  the  selected  filters  of  Tables  14  and  15. 
However,  in  these  subsequent  experiments,  only  the  pure  vowels  (*  in 
Table  13)  were  used. 

The  particular  quantization  thresholds  used  in  the  WU  system  are 
given  in  Table  16,  which  corresponds  functionally  to  those  given  in  the 
BISS  specification  (Appendix  III).  The  differences  in  the  conditions 
under  which  these  two  sets  of  thresholds  were  determined  are  shown  in 
Table  17. 
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Table  14 


High  Amplitude  Filters  for  BISS  Filter  Bank 


Filter  Number 


Vowel 

Word 

D 

q 

a 

a 

D 

a 

a 

tm 

9 

ID 

■m 

12 

13 

14 

i 

beet 

Male 

X 

B 

: 

x 

X 

X 

X 

X 

X 

Female 

X 

H 

i 

X 

X 

X 

1 

bit 

Male 

B 

X 

X 

X 

X 

X 

X 

X 

X 

Female 

□ 

X 

X 

X 

X 

X 

X 

X 

£ 

bet 

Male 

X 

X 

X 

X 

X 

X 

X 

Female 

X 

X 

X 

X 

X 

X 

X 

X 

X 

iU> 

bat 

Male 

X 

X 

X 

X 

X 

X 

X 

X 

Female 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

a 

pot 

Male 

X 

X 

X 

X 

X 

X 

X 

X 

Female 

X 

X 

X 

X 

X 

X 

3 

bought 

Male 

X 

X 

X 

X 

X 

X 

X 

Female 

X 

X 

X 

X 

X 

X 

but 

Male 

H 

X 

X 

X 

X 

X 

Female 

H 

X 

X 

x 

X 

x 

X 

(l 

put 

1 

Male 

i 

R 

■  ; 

1 

1 

X 

X 

X 

1 

1 

X 

X 

Female 

1 

X 

X 

X 

X 

u 

ooot 

Male 

fl 

R 

■ 

I 

1 

X 

X 

X 

1 

X 

X 

X 

X 

Female 

1 

D 

1 

X 

X 

X 

X 

X 

1 

3 

Bert 

Male 

H 

R 

1 

X 

X 

X 

1 

1 

Female 

1 

R 

1 

X 

x 

X 

X 

f 
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Table  16 


QUANTIZATION  THRESHOLDS 
DSC  FILTER  BANK 

NORMALIZING  BY  STANDARD  DEVIATION 


Filter 

Number 

Quantization  Level 

1 

2 

3 

4 

5 

6 

7 

1 

-2330 

-1489 

-761 

10 

583 

1120 

1940 

2 

-76 

531 

1222 

1844 

2529 

3195 

4088 

3 

-390 

59 

577 

1103 

1744 

2518 

3424 

4 

-206 

479 

1071 

1704 

2353 

3062 

4047 

5 

-194 

420 

1010 

1586 

2086 

2854 

3851 

6 

-998 

-139 

496 

1032 

1629 

2246 

2990 

7 

-1466 

-561 

-30 

501 

1006 

1753 

2660 

8 

-1020 

-610 

-165 

278 

795 

1518 

2522 

9 

-1258 

-750 

-283 

136 

681 

1539 

2437 

10 

-1282 

-749 

-164 

364 

1139 

1931 

2863 

11 

-568 

174* 

796 

1396 

2010 

2702 

3750 

12 

-374 

188 

744 

1312 

1934 

2813 

3760 

13 

-660 

-167 

260 

726 

1231 

1871 

2749 

14 

-1933 

-1298 

-830 

-453 

1 

! 

468 

1113 
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TABLE  17.  CONDITIONS  FOR  DETERMINING  QUANTIZATION  THRESHOLDS 


Condition 


BISS 


WU 


Filter  bank  (Table  10) 
Data  set  as  per 
Speaker  population 
Filters  used 
Regression 
Regression  limiting 
Normalization  by 


BISS  filters 
Table  12 
21  males 
10  high  energy 
Legendre  polyn. 
no 

mean 


DSG  filters 

Table  13  (only  10  vowels) 
11  males,  11  females 
Selected  as  per  Table  15 
Sine/cosine 
yes 

standard  deviation 


5 .  ENERGY 


For  each  time  sample,  a  measure  of  the  energy  was  also  computed. 
In  the  BISS  algorithm,  this  was  just  the  maximum  energy  from  the  four¬ 
teen  filters.  As  an  aid  to  distinguishing  vowels  from  nasals  [which 
usually  have  most  of  their  energy  in  a(l)]  and  vowels  from  sibilants 
[which  usually  have  most  of  their  energy  in  a(14)],  these  two  filters 
were  not  used  in  calculating  the  new  energy  measure  given  by 


13 

E  =  sqrt[ 

1 

i=2 


1 

11 


13 

^2  a(i) 

i=2 


2 


(Def.  1) 


During  the  experiments  described  in  Section  V,  it  was  determined 
that  the  following  energy  definition  gave  still  better  definition  of  the 
vowel  boundaries. 


E 

2 


1 

8 


14  *  max  f  a(i)  } 
i=2 ,13 


[Def. 


2) 


C.  VERIFICATION  PROCESSING 


After  the  input  speech  data  have  been  preprocessed,  they  are  com¬ 
pared  to  reference  patterns  for  the  four  given  words  by  "scanning"  the 
reference  patterns  across  the  input  data  and  calculating  a  "scanning 
error"  for  each  centisecond  of  data.  Minima  (valleys)  in  these  scanning 
errors  are  saved  and  an  optimal  sequence  of  valley  points  is  found  using 
a  partial  sequence  error  (called  a  "point-pair  error")  between  adjacent 
valleys.  After  an  optimal  sequence  of  valley  points  is  found,  a  deci¬ 
sion  function  is  calculated  and  is  compared  to  a  decision  threshold. 
This  subsection  will  describe  the  changes  that  have  been  made  in  the 
point-pair  error  calculation,  in  the  decision  strategy,  in  the  decision 
function  calculation,  and  in  various  parameters  used  in  these  calcula¬ 
tions  . 
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1.  POINT-PAIR  ERROR  CALCULATION 


This  subsection  gives  the  equations  used  in  the  point  pair  error 
calculations  for  various  systems. 


BISS: 


(e  (i  )  +  1)  *  (e  (i+1) 


EW  = 


(‘ ' 


I dt  -  dthatl 


max (dthat ,min  dthat) 


) 


2048 

where:  .5  x  dthat  <  dt  <  2  x  dthat;  min  dthat  =  20;  max  EW  =70; 
CIC: 

(e  (1 )  We(i+1)  \/  Idt  -  dthatl  \ 

- +  l)  l - +  l)  (l  + - 1 

10  /  '  10  /  \  max (dthat , mi n  dthat)' 

where:  .5  x  dthat  <  dt  <  2  x  dthat;  min  dthat  =  20;  max  EW  =  1500 

DSG/WU: 


EW  = 


(e(i)  +  K)  *  (e  (i  +  1)  +  K) 


1024 


(1  + 


Idt  -  dthatl 


max (dthat , mi n  dthat) 


) 


where:  .33  x  dthat  <  dt  <  2  x  dthat;  min  dthat  =  10;  max  EW  =  200; 

K  =  100; 

To  check  that  the  max  EWs  are  consistent  among  all  the  equations, 
find  the  e(i)"s  under  the  conditions  of  dt  =  dthat,  e(i)  =  e(i+l),  and 
EW  =  max  EW.  This  yields  e(i)  =  377.3  for  CIC,  e(i)  =  378.6  for  BISS, 
and  e(i)  =  352.5  for  DSG/WU.  This  lower  e(i)  for  the  BISS/CIC  system 
is  due  to  the  higher  "K,"  which  tends  to  decrease  the  sensitivity  of  EW 
to  low  valley  point  errors.  The  benefit  of  this  decrease  in  sensitivity 
is  to  keep  very  low  e(i)'s  from  allowing  verv  high  e(i+l)'s. 


For  reference  purposes,  the  equations  used  in  the  two  "rnotal  Voice" 
programs  are  given  below.  These  programs  use  the  optimal  sequence 
finder  subroutine  to  find  sequences  of  either  2  or  3  reference  ooints  in 
words,  rather  than  to  find  sequences  of  4  points  in  a  phrase,  as  in  the 
original  BISS  and  WU  systems.  The  real  difference,  however,  is  that 
the  format  and  size  of  the  "scanning  patterns"  are  different.  The  BISS 
system,  the  old  CIC  system,  and  both  the  DSG  and  WU  systems,  use  6  time 
slices  (samples)  of  14  filters  (84  elements  total).  The  original  total 
voice  program  used  2  columns  of  9  filters,  2  regression  coefficients  and 
energy,  plus  the  difference  between  these  two  12  element  columns  (36 
elements  total).  The  revised  total  voice  program  used  5  columns  of  14 
filters,  2  regression  coefficients  and  energy,  plus  the  difference 
between  adjacent  columns  (153  elements  total).  Hence,  scanning  errors 
will  have  a  different  range  of  values,  as  reflected  by  the  chosen  valley 
point  error  maxima:  200  for  the  original  total  voice,  615  for  the  re¬ 
vised  total  voice,  and  400  (now  500)  for  BISS/CIC. 
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ORIGINAL  TOTAL  VOICE ; 


(e(i)  +  K)  *  (e  (i+I) 


EW  = 


2048 


+  K)  / 
.....  * 


idt  -  dthatl 
max (dthat , min  dthat) 


where:  min  dthat  =  0;  K  =  40;  max  EW,  dtmin  &  dtmax  are  word  dependent. 


REVISED  TOTAL  VOICE: 


EW 


(e  (i)  +  K)  *  (e  (i+1)  +  K) 
1024 


(l  +  2  * 


2 

(  dt  -  dthat  ) 


max (dthat , min  dthat) 


) 


where:  min  dthat  =  4;  K  =  100;  max  EW,  dtmin  &  dtmax  are  word  dependent 

All  of  these  equations  can  be  rewritten  in  the  following  form,  with 
the  constants  specified  in  Table  18.  Note  that  if  K  is  chosen  correctly 
for  a  given  filter  bank  and  preprocessing,  the  C*EWMAX  product  remains 
constant. 

1  /e(i)  \  /e  (i+1)  \(  Idt  -  dthatla 

EW  - - ( - +  lj  l - +  lJ  11  +6  - 

C\K  '  '  K  '  '  max (dthat , min  dthat) 


TABLE 

18.  POINT 

-PAIR 

ERROR 

PARAMETERS 

SYSTEM 

C 

K 

6 

a 

min  dthat 

max  EW 

CIC 

1 

10 

1 

1 

20 

1500 

BISS 

2048 

1 

1 

1 

20 

70 

DSG 

0.1024 

100 

1 

1 

10 

70 

OTV 

1.28 

40 

1 

1 

0 

* 

RTV 

0.1024 

100 

2 

2 

4 

★ 

*  Word  dependent 

One  interpretation  of  K  is  for  it  to  be  the  ehat  averaged  over  the 
population  (<ehat>) .  In  trying  to  establish  the  values  of  the  ehats 
averaged  over  the  population,  we  can  refer  to  the  average  ehats  gathered 
from  the  CIC  systems  and  from  laboratory  experiments,  as  given  in  Table 
19. 
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TABLE  19.  AVERAGE  EXPECTED  SCANNING  ERRORS 


BISS  FILTER 
BISS  P RE PRO 

BISS  FILTER 
IPM2  P RE PRO 

DSG  FILTER 
IPM2  P RE PRO 

TEST  SET 

MALES 

109 

130 

117 

AFTER  ENRLMNT 

FEMALES 

121 

148 

135 

TEST  SET 

MALES 

91 

109 

101 

AFTER  21  SESS. 

FEMALES 

105 

126 

118 

CIC  OVERALL  (MALES 

&  FEMS) 

94 

125 

94 

WO  OVERALL  (MALES 

&  FEMS) 

- 

- 

101 

Since  the  average  ehats  clearly  depend  upon  filter  bank  and  prepro¬ 
cessing  method,  the  Ks  should  also  reflect  this  difference.  Note  that 
the  average  ehats  are  further  dependent  upon  sex  and  word  (reference 
Section  IX) ,  and  hence  consideration  should  be  given  in  the  future  to  at 
least  making  K  dependent  upon  the  user's  sex. 

By  the  interpretation  of  K  as  being  <ehat>,  the  K  of  100  used  for 
the  DSG/WU  system  is  certainly  reasonable. 

2.  DECISION  STRATEGY 

In  the  BISS  algorithm,  one  session  of  up  to  four  phrases  was  used 
to  make  a  decision  to  accept  the  user.  If  the  person  were  not  accepted, 
a  second  (independent)  session  of  up  to  four  phrases  was  used,  with 
tighter  thresholds  than  those  used  in  the  first  session.  However,  the 
algorithm  has  since  been  revised  so  that  rather  than  totally  disregard¬ 
ing  the  first  four  phrases  spoken  by  the  user,  either  all  spoken  phrases 
(up  to  four)  would  be  used  in  the  decision  process  with  normal  decision 
thresholds  or,  alternately,  only  the  "best"  (of  up  to  seven)  phrases 
would  be  used,  in  which  case,  more  restrictive  (prejudiced)  thresholds 
would  be  used  in  the  decision. 

Figure  19  is  a  flow  chart  of  the  revised  algorithm.  In  the  revised 
algorithm,  after  the  Nth  phrase  is  processed  (N  <  4),  if  all  phrases 
have  been  "registered"  (a  match  has  been  made  between  the  input  and  the 
reference  patterns  for  all  of  the  prompted  words) ,  a  decision  function 
(see  next  subsection)  is  calculated  using  appropriate  parameters  and  is 
compared  to  a  "normal"  mode  decision  threshold.  If  the  decision  func¬ 
tion  is  less  than  the  threshold,  the  user  has  been  verified;  if  not, 
the  worst  phrase  is  excluded  and  the  decision  function  is  recalculated 
and  compared  to  a  "prejudiced"  mode  decision  threshold.  This  procedure 
is  repeated  until  either  the  user  has  been  verified  or  all  phrases  have 
been  excluded.  Then,  if  the  user  has  not  been  verified,  if  less  than 
seven  phrases  have  been  prompted,  and  if  no  more  than  three  phrases  have 
been  misregistered ,  another  phrase  is  prompted  and  the  decision  proce¬ 
dure  is  repeated. 

If  N  is  larger  than  four,  or  if  one  or  more  of  the  phrases  was  not 
registered,  the  "prejudiced"  mode  thresholds  are  used  from  the  start. 
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Figure  19  CIC  Verification  Strategy  Overview 
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Maximum  it  of 
misregistered  phrases 
allowed  (3) 


Select 

Phrase 


636 


1 


N  =  Number  of  phrases  not  registered 
R 

N 

RG  =  Number  of  unique  registered 
phrases 

N^,  =  Total  number  of  phrases  prompted 

NP^  =  Total  number  of  times  of  times 
the  ith  phrase  is  prompted 

REG^  *  .True,  when  phrase  i  is  registered 
.false,  otherwise 


Compute  decision  function  for  used  phrases: 
N 

r  ep 
n-1  n 


min  MAX  £  EXP  ,  N-e.ieMTM 

n-1  n  NSM1N  *  N‘eNSMAX 


NX>N  >Nvv 


dT  =  \ 


Verified 


k  =  the  number  of  the  used  phrase 

with  MAXIMUM  EP,  /EXP, 
k  k 


USED,  •  .  false. 


N  = 

N-1 

Select 

Phrase 


USED^  =  .TRUE,  if  phrase  i  used 
for  decision  function 
computat ion 

N  =  Number  of  phrases  used  in 
computing  decision  function 

d^  =  Decision  threshold 

D^  =  Decision  thresholds  as  a 
N  function  of  t  of  phrases 


D^,p  =  Prejudiced  decision  thresh- 
N  olds  as  function  of  N 


Select 

Phrase 


where  the  best  (4  or  less),  registered  phrases  are  used  in  the  calcula¬ 
tion  of  the  decision  function.  If  the  user  is  not  verified,  the  worst 
phrase,  as  before,  is  excluded  and  the  decision  function  is  recalculat¬ 
ed.  This  procedure  is  repeated  until  either  the  user  has  been  verified 
or  all  phrases  have  been  excluded.  Then,  if  the  user  has  not  been  veri¬ 
fied,  less  than  seven  phrases  have  been  prompted,  and  no  more  than  three 
phrases  have  been  misregistered ,  another  phrase  is  prompted  and  the  de¬ 
cision  procedure  is  repeated. 

3.  DECISION  FUNCTIONS 


The  decision  function  that  was  used  for  both  the  BISS  and  the  CIC 
systems  was  of  the  form: 


D  = 


N 

4 

£ 

£  *ik 

k=l 

H* 

II 

f-» 

r  t 

f  N 

4  * 

max 

min  1 

X 

£•« 

k=l 

H* 

II 

140 

A 

and  e 

__ 

100  for  1 

max 


).«■:  1 

/  min. 


max  mxn 

However,  it  was  felt  that  impostor  rejection  on  1-phrase  decisions  could 
be  improved  by  computing  the  min/max  limits  on  error  normalization  over 
the  entire  reference  file  rather  than  individually  on  each  subset  of 
phrases.  Hence,  a  new  decision  function  was  defined  as 

N  4 


D' 


where 


0  = 


max 


£ 

k=l 

£ 

i  =  l 

e 

i  k 

N 

4 

A 

£ 

£ 

e 

ik 

*" 

II 

i  =  l 

4 

A 

£ 

i  =  l 

e 

i  k 

4 

■A 

£ 

e  , 
ik 

16 

k=l  i=l 

(For  a  4-phrase  decision,  D'  is  identical  to  D.) 


). l 

max  /  min. 
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4.  DECISION  PARAMETERS 

Since  different  preprocessing  methods  yield  different  ranges  of 
"scanning  errors,"  it  became  necessary  to  adjust  the  various  parameters 
in  the  decision  function  equation.  In  addition,  the  desire  to  improve 
the  selectivity  of  the  "good"  phrases,  precipitated  decision  parameters 
that  were  a  function  of  the  number  of  phrases  in  the  decision  function. 
Hence,  there  became  more  parameter  variations  than  for  BISS,  as  shown  in 
Tables  20  and  21. 

TABLE  20.  SETS  OF  DECISION  FUNCTION  THRESHOLDS  USED 


BISS 

Set  1 

Set  2 

Set 

Post -enrollment : 

Normal  &  Prejudiced* **  Mode: 

1-3  Phrases: 

0 

0 

0 

0 

4  Phrases  : 

145 

135 - 

— >  145 

145 

Post-post- en  rollment : 

Normal  &  Prejudiced  Mode: 

1  Phrase: 

* 

85 

85 

85 

2  Phrases: 

★ 

110 

110 

110 

3  Phrases: 

* 

125  — 

— >  120 

120 

4  Phrases: 

★ 

135  — 

— >  140 

140 

Normal: 

Normal  Mode: 

1  Phrase: 

100 

105 

105  — 

— >  100 

2  Phrases: 

120 

125 

125 

125 

3  Phrases: 

135 

130 

130 

130 

4  Phrases: 

145 

135 

135 

135 

Prejudiced  Mode**: 

1  Phrase: 

85 

85 

85 

85 

2  Phrases: 

110 

110 

110 

110 

3  Phrases: 

130 

120 

120 

120 

4  Phrases: 

145 

130 

130 

130 

*  "Post-post-enrollment"  did  not  exist  for  BISS. 

**  Prejudiced  mode  parameters  are  "auto-abort"  parameters  for  BISS 
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TABLE  21 


.  SETS  OF  MAX/MIN  SCANNING  ERRORS  ALLOWED  (EMAX/EMIN) 


BISS 

Set  1 

Set  2 

Set  3 

Post-en rollment : 

Emax:  1-4  Phrases: 

140 

155 

186 

175 

Emin:  1-4  Phrases: 

100 

120 

144 

130 

Post-post-en  rollment : 

Emax:  1  Phrase: 

★ 

120 

14  4 

130 

2  Phrases: 

* 

130 

156 

145 

3  Phrases: 

* 

140 

168 

160 

4  Phrases: 

* 

150 

180 

175 

Emin:  1  Phrase: 

★ 

100 

120 

100 

2  Phrases: 

* 

105 

126 

110 

3  Phrases: 

* 

110 

132 

120 

4  Phrases: 

★ 

115 

138 

130 

Normal  Mode: 

Emax:  1  Phrase: 

140 

130 

156 

145 

2  Phrases: 

140 

135 

162 

155 

3  Phrases: 

140 

140 

168 

165 

4  Phrases: 

140 

145 

174 

175 

Emin:  1-4  Phrases: 

100 

100 

120 

115 

*  "Post-post-enrollment"  did  not  exist  for  PISS. 

D.  OTHER  MODIFICATIONS 

There  were  two  significant  modifications  other  than  the  ones  des¬ 
cribed  in  this  section.  The  first  v;.-  the  inclusion  of  a 
"post-post-enrollment"  mode  for  users  who  hav:  had  from  4  to  12  previous 
verifications.  The  "normal"  mode  then  becom  •  r  the  mode  for  users  when 
they  have  had  twelve  or  more  verifications.  "'hose  mode  designations  af¬ 
fect  the  choice  of  decision  thresholds  and  parameters,  as  explained  ear¬ 
lier,  and  the  choice  of  updating  factors  for  reference  files.  The 
scheme  for  updating  is  to  add  1/A  times  each  spectral  element  of  the 
"scanning"  pattern  formatted  from  the  input  speech  to  (l-Al/A  times  each 
spectral  element  of  the  reference  scanning  pattern.  With  the  addition 
of  the  new  mode,  the  updating  factors,  A,  become  A=4  for 
"post-enrollment"  mode  (0-3  prior  verifications),  A=8  for 
"post-post-enrollment"  mode  (4-11  prior  verifications),  and  A=16  for 
"normal"  mode  (>11  prior  verifications). 

The  other  algorithm  modification  was  the  inclusion  of  numerous  cri¬ 
teria  to  be  used  for  terminating  enrollment.  (The  BISS  specification 
included  in  Appendix  III  uses  "terminating  enrollment"  for  successful 
completion  of  enrollment  and  "interruption"  for  the  unsuccessful  comple¬ 
tion  of  enrollment.  In  the  DSG/VVU  system,  the  cor i essponding  terms  are 
"completion"  and  "termination,"  respectively.  This  discussion  will  use 
the  DSG/VVU  terminology.)  For  BISS,  the  only  method  of  termination  is  by 
operator  intervention.  For  the  old  980-based  CIC  system,  the  following 
criteria  were  added  for  the  termination  of  enrollment: 
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1.  There  is  no  speech  input  for  three  consecutive  prompts. 

2.  The  average  scanning  error  (difference  between  input  and 
reference)  across  all  words  exceeds  160  for  preprocessing 
using  Legendre  polynomial  regression,  or  exceeds  200  for 
preprocessing  using  sine/cosine  regression. 

3.  The  number  of  times  required  in  establishing  or  reestablish¬ 
ing  reference  point  data  exceeds  11. 

4.  The  reestablishment  of  reference  points  is  required  for  a 
phrase  (necessary  when  the  same  phrase  does  not  register  two 
consecutive  times)  after  twenty  phrases  in  all  have  been 
prompted . 

For  the  DSG/WU  system,  the  following  criteria  were  added  to  those  used 
in  the  old  CIC  system: 

1.  There  is  no  speech  input  for  two  consecutive  prompts. 

2.  The  total  number  of  prompts  during  the  establishment  of  the 
initial  reference  point  data  exceeds  seven. 

3.  The  number  of  reprompts  of  the  same  phrase  during  the 
establishment  of  the  initial  reference  point  data  exceeds 
three. 

4.  The  number  of  reprompts  needed  to  reestablish  reference 
point  data  for  a  phrase  exceeds  four. 
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SECTION  V 


ALGORITHM  MODIFICATION  TESTING 


As  shown  in  Figure  1  (Section  I) ,  the  980-based  entry  control  sys¬ 
tem  installed  at  the  Corporate  Information  Center  at  Texas  Instruments 
provided  an  excellent  vehicle  for  testing  the  various  algorithm  modifi¬ 
cations  described  in  Section  IV.  However,  because  of  intersession  vari¬ 
ation  of  the  speakers  (colds,  maturity  in  system  use,  etc.),  it  soon  be¬ 
came  evident  that  a  more  controlled  set  of  repeatable  data  was  needed  to 
perform  algorithm  tradeoffs.  To  this  end,  an  enrollment  session  and  21 
verification  sessions  were  collected  from  11  speakers  (6  men,  5  women) 
for  use  in  these  tradeoff  studies.  This  section  first  describes  experi¬ 
ments  performed  with  the  off-line  data  base,  then  those  with  the  opera¬ 
tional  entry  control  system. 


A.  OFF-LINE  11-SPEAKER  TESTING 

1.  THE  DATA  SET 

The  data  used  for  all  the  off-line  experiments  in  this  section  were 
used  as  input  to  a  variety  of  simulated  filter  banks  and  consisted  of 
one  enrollment  session  of  48  phrases  (two  repetitions  of  a  set  of  24 
phrases)  and  21  verification  sessions  of  12  phrases  (three  repetitions 
of  a  set  of  four  phrases)  collected  from  eleven  speakers  (6  males,  5  fe¬ 
males)  over  a  period  of  about  one  month  (no  more  than  two  sessions  per 
day).  An  enrollment  without  any  problems  requires  20  phrases.  The 
amount  collected  provides  one  extra  set  of  4  phrases,  and  an  extra  re¬ 
petition  for  each  phrase.  The  data  were  digitized  at  12.5  kHz  and  then 
filtered  using  a  variety  of  simulated  filter  banks.  The  same  data  were 
used  in  the  LPC  residual  energy  experiments  described  in  Section  VII  of 
this  report. 

For  one-phrase  decisions,  these  data  provided  2772  (11*12*21)  tri¬ 
als.  Since  there  were  only  eleven  speakers,  however,  these  are  not  2772 
independent  trials. 

The  last  four  verification  sessions  (18  through  21)  for  each  talker 
were  used  as  input  for  the  impostor  trials.  Since  there  were  6  males 
and  5  females,  this  yielded  200  (4*6*5  +  4*5*4)  sessions,  or  2400  trials 
if  each  of  the  12  phrases  in  each  session  were  used  in  a  one-phrase  de¬ 
cision.  All  the  experiments  reported  in  this  section  were  for 
one-phrase  decisions. 

2.  FILTER  BANK  DEFINITIONS 

This  subsection  provides  a  continuation  of  Table  10  given  in  Sec¬ 
tion  IV  for  the  various  filter  banks.  The  first  column  pair  in  Table  22 
again  shows  the  design  center  frequencies  for  the  two-stage  digital 
filters  used  for  the  BISS  ADM.  (Center  frequencies  of  each  of  the  two 
stages  were  slightly  offset.)  The  center  frequencies  and  bandwidths  for 


the  "Total  Voice"  work  are  shown  in  the  second  column  pair. 

The  set  of  simulated  filter  banks  used  in  the  experiments  in  this 
section  were  an  outgrowth  of  the  Total  Voice  filters,  and  hence  were  re¬ 
ferred  to  as  TV  prime  (TVP) ,  with  the  bandwidth  of  the  initial  filter 
following  (e.g.,  TVP200) .  These  experiments  showed  that  speaker  verifi¬ 
cation  performance  improved  as  the  bandwidths  of  the  filters  decreased. 
The  result  was  that  the  14-channel  DSG  filter  bank,  was  selected  to  be 
defined  as  the  TVP  filter  bank  having  an  initial  filter  bandwidth  of  200 
Hz.  The  iterative  definition  of  these  filter  banks  is  again  given  by 


BW 

1/13 

=  BW  ( 2 ) 

CF  =  CF  +  (CF  -  CF  ) 

n+1 

n 

n+1  n  n  n-1 

where,  BW  = 

200,  CF  =  350, 

1/6 

and  CF  =  350  -  100/(2) 

1 

1 

0 

Shown  also  are  bandwidths  with  BW  =  300,  BW  =  250,  and 

1  1 


1/6 

(2) 


BW  =  150 . 

I 


TABLE  22.  FILTER  BANK  DEFINITIONS 


BISS 

ADM 

TOTAL 

VOICE 

TVP 

(DESIGNED) 

(DESIGNED) 

(DSG/WU) 

FILTER  C.F. 

BW. 

C.F. 

BW. 

C.F. 

BW. 

BW. 

BW. 

BW. 

1 

410 

290 

350 

300 

350 

200 

300 

250 

150 

2 

57  5 

290 

450 

300 

450 

211 

307 

259 

162 

3 

741 

290 

555 

310 

562 

223 

314 

269 

174 

4 

906 

290 

670 

340 

688 

235 

321 

279 

188 

5 

1071 

290 

790 

380 

830 

248 

328 

289 

203 

6 

1237 

290 

940 

400 

988 

261 

335 

300 

219 

7 

1402 

290 

1120 

400 

1167 

275 

343 

311 

236 

8 

1567 

290 

1320 

400 

1367 

290 

350 

32  2 

254 

9 

1733 

306 

1550 

400 

1591 

306 

358 

334 

27  4 

10 

1898 

290 

1810 

400 

1843 

323 

366 

346 

296 

11 

2063 

290 

2100 

400 

2126 

341 

374 

359 

319 

12 

2229 

290 

2430 

400 

2443 

360 

383 

37  2 

344 

13 

2394 

290 

2800 

400 

2800 

379 

391 

386 

371 

14 

2559 

290 

3350 

700 

3200 

400 

400 

400 

400 

15 

*2725 

290 

4000 

700 

**3649 

422 

409 

415 

431 

16 

*2890 

290 

4650 

700 

***4153 

445 

418 

430 

465 

*  Not 

used  in 

ADM, 

although 

implemented  in 

actual 

fi Iters 

'*  As 

defined 

for  CCD  analyzer;  does 

not  exist  in 

DSG 

filter 

board 

***  Cut-off  freq  for  high  pass  filter  as  defined  for  CCD  analyzer; 
does  not  exist  on  DSG  board. 
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3.  PERFORMANCE  MEASURES 

The  traditional  error  measure  in  measuring  speaker  verification 
performance  has  been  the  equal  error  point,  i.e.,  the  percent  error  at 
the  acceptance  threshold  where  the  Type  I  (true  speaker  rejection)  and 
the  Type  II  (impostor  acceptance)  error  rates  are  equal.  Sometimes, 
however,  it  is  not  the  equal  error  rate  that  is  of  interest,  but  the 
percent  of  one  type  of  error  given  an  acceptable  value  for  the  other 
error.  An  example  of  this  is  the  BISS  desire  for  the  lowest  Type  II 
error  rate  given  a  Type  I  error  rate  of  1%.  This  complicates  the  com¬ 
parison  of  systems  that  have  reported  their  error  rates  in  different 
ways.  The  observation  that  the  product  of  the  Type  I  error  rate  and  the 
Type  II  error  rate  at  a  given  threshold  is  relatively  constant  for  all 
thresholds  has  prompted  Texas  Instruments  to  begin  measuring  performance 
with  the  use  of  a  probability  product,  as  defined  by 


where 

N 

TS 

N 

IM 

N 

TS 

k 

N 

IM 

k 


1 

00 

VN 

*  N 

Z-/  TS  IM 

TS  IM  k=l  k  k 


total  number  of  true  speaker  trials, 

total  number  of  impostor  trials, 

number  of  true  speaker  errors  at  threshold  "k"  , 

number  of  impostor  errors  at  threshold  "k". 
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In  fact,  for  smaller  experiments,  a  smoothing  of  the  product  has 
found  to  be  desirable,  as  defined  by 


E 

r  r 


where  the  parameters  are  as  defined  above.  Both  of  these  error  measures 
were  used  in  this  chapter  to  evaluate  performance  of  the  various  experi¬ 
ments. 


4 .  RESULTS 


The  results  of  the  experiments  run  on  the  11-speaker  data  base  are 
shown  in  Tables  23  and  24.  Table  23  shows  the  expected  (scanning)  er- 


Speaker  Expected  Errors 


I _  Enrollment 


Speaker 

1  BISS 

IPM0D2 

TVP( 300) 

TVP250 

TVP200 

TVP150 

BS 

142 

156 

152 

148 

149 

152 

CC 

88 

118 

102 

93 

95 

100 

CW 

161 

185 

192 

194 

190 

196 

TK 

83 

107 

92 

85 

85 

87 

GD 

85 

108 

107 

96 

98 

98 

JH 

92 

108 

99 

88 

84 

91 

Male  Average 

109 

130 

124 

117 

117 

121 

(Male  o) 

(34) 

(33) 

(40) 

(44) 

(43) 

(44) 

BH 

118 

154 

149 

130 

131 

128 

DD 

100 

145 

119 

111 

111 

117 

FG 

116 

119 

115 

111 

113 

124 

YH 

142 

159 

156 

183 

182 

199 

LM 

131 

160 

142 

133 

140 

129 

Female  Average 

121 

148 

136 

134 

135 

139 

(Female  o) 

(16) 

(17) 

(18) 

(29) 

(29) 

(34) 

Average 

114 

138 

130 

125 

125 

129 

(a) 

(27) 

(27) 

(31) 

(37) 

(37) 

(39) 

1 

After  21 

Sessions 

Speaker 

1  BISS 

IPM0D2 

TVP(300) 

TVP250 

TVP200 

TVP150 

BS 

104 

117 

123 

116 

114 

116 

CC 

77 

103 

90 

81 

82 

82 

CW 

124 

136 

157 

142 

142 

145 

TK 

75 

97 

102 

90 

88 

91 

GD 

88 

104 

113 

102 

101 

105 

JH 

79 

95 

91 

84 

81 

83 

Male  Average 

91 

109 

113 

103 

101 

104 

(Male  a) 

(19) 

(15) 

(25) 

(23) 

(24) 

(24) 

BH 

111 

124 

138 

126 

132 

135 

DD 

88 

119 

95 

92 

93 

99 

FG 

106 

117 

113 

107 

111 

123 

YH 

107 

131 

109 

124 

131 

142 

LM 

112 

138 

133 

123 

125 

129 

Female  Average 

105 

126 

118 

114 

118 

126 

(Female  o) 

(10) 

(9) 

(18) 

(15) 

(16) 

(16) 

Average 

97 

116 

115 

108 

109 

114 

(  o  ) 

(17) 

(15) 

(21) 

(20) 

(22) 

(23) 

Table  24 


Per f ormance  Comparison  of  Process i ng  Methods 


Processing 

Method 

P  •  P  ; 
^rr 1  tr  1 

Errors  at  Equal 

Error  Pol nt:Threshold 

D  =  e 

D  =  e/e 

D  =  CIC 

D  *  CIC’ 

BISS 

Male 

Female 

Total 

:  26:170 
:  76:162 
-.104:164 

: 35 : 1 . 6 1 
: 38:1. 54 
: 75 : 1 . 57 

:  -.24  -.1.51 

:  -.47:1.45 

-.71:1.47 

IPM0D2 

Male 

Female 

Total 

0.64:  21:200 

1.91:  58:192 

1.70:  77:195 

0.74:23:1.65 

1.13:30:1.51 

0.93:59:1.56 

-.0.60: 18:1.46 
-.1.41:34:1.44 
-.0. 95:52:1. 45 

:0. 56:17:1. 53 
:1. 13:28:1. 48 
:0. 82:45:1. 50 

TVP (300) 

Male 

Female 

Total 

:  30:202 
:  41:205 
:  72:204 

:29: 1.66 
:  23  : 1 . 59 
:  52 : 1 . 63 

:  -.22:1.58 

:  :  23  : 1 . 54 

:  -.43:1.56 

TVP250 

Male  0.76:0.83:  25:188  0.72:0.73:26:1.65  0.55:0.57:16:1.51 
Female  1.17:1.19:  37:199  0.76:0.77:21:1.62  0.87:0.87:23:1.56 
Total  0.94:0.98:  62:194  0.74:0.75:47:1.65  0.70:0.71:39:1.54 


0.47:0.47:17:1.52 

0.71:0.74:21:1.58 

0.58:0.57:40:1.55 


TVP200 

Male  0.80:0.79:  25:188  0.79:0.75:29:1.64  0.66:0.61:19:1.45 
Female  1.05:1.08:  32:203  0.70:0.73:18:1.60  0.75:0.74:18:1.59 
Total  0.93:0.93:  54:196  0.74:0.74:46:1.62  0.72:0.70:38:1.52 


0.59:0.53:16:1.46 

0.65:0.66:16:1.57 

0.63:0.59:36:1.51 


TVP1S0 

Kale  0.74:0.77:  26:194  0.81:0.79:30:1.64  0.62:0.64:19:1.47  0.55:0.53:17:1.45 
Female  0.74:0.76:  23:225  0.55:0.51:16:1.70  0.50:0.54:14:1.62  0.51:0.52:13:1.64 
Total  0.85:0.86:  52:207  0.68:0.67:45:1.65  0.64:0.64:33:1.54  0.59:0.57:33:1.54 
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rors  for  all  speakers,  both  after  enrollment  and  after  21  sessions,  for 
all  experimental  conditions.  Table  24  shows  the  verification  results 
for  up  to  four  different  decision  functions:  the  average  error,  the 
normalized  average  error,  and  the  two  decision  functions  (D  and  D")  de¬ 
fined  in  Section  IV.  Up  to  four  numbers  are  shown  for  each  experiment: 
the  average  (smoothed)  probability  product,  the  raw  probability  product, 
the  number  of  errors  at  the  equal  error  rate,  and  the  threshold  at  the 
equal  error  rate.  Note  that  there  were  1512  male  true  speaker  trials, 
1260  female  true  speaker  trials,  1440  male  impostor  trials,  and  960  fe¬ 
male  impostor  trials. 

The  first  two  experiments,  labeled  BISS  and  IPM0D2  in  the  tables, 
were  both  run  using  the  simulated  BISS  filter  bank  to  test  for  any 
difference  caused  by  the  preprocessings.  The  other  four  experiments 
were  run  using  the  simulated  TVP  filter  banks,  with  varying  bandwidths. 
The  first  experiment  was  run  using  the  BISS  type  preprocessing.  All  of 
the  other  experiments  were  run  using  the  DSG  (IPMOD2)  type  preprocess¬ 
ing. 


Comparison  of  the  first  two  experiments  clearly  shows  the  results 
for  all  three  decision  functions  in  Table  24  are  better  using  the  DSG 
(IPMOD2)  preprocessing. 

Comparison  of  the  second  and  third  experiments  shows  that  the  male 
performance  degrades  slightly,  while  the  female  performance  improves 
dramatically.  Although  the  overall  error  rate  decreases  for  the  TVP 
filters,  more  importantly,  the  error  rate  becomes  more  uniform  across 
sex. 


The  performance  also  improves  for  the  TVP  filters  as  the  bandwidth 
of  the  bandpass  filters  is  decreased.  Since  it  was  believed  that  the 
bandwidth  for  the  TVP150  set  of  filters  left  too  many  spectral  holes  and 
that  its  lower  error  rate  might  be  an  artifact  caused  by  too  small  a 
sample  size,  the  TVP200  filter  bank  was  chosen  for  the  DSG/WU  systems. 

Performance  clearly  improves  from  left  to  right  in  Table  24. 
More  importantly,  it  is  the  female  performance  that  most  drastically  im¬ 
proves  between  the  first  column  (unnormalized  decision  function)  and  the 
other  three  columns  (all  normalized  decision  functions) . 


B.  ON-LINE  OPERATIONAL  TESTING  FOR  980-BASED  CIC  SYSTEM 
1 .  THE  DATA 

The  data  used  to  derive  the  true  speaker  (Type  I)  performance  were 
the  same  data  used  during  the  normal  operation  of  the  booth.  Almost 
none  of  these  data  were  actually  recorded,  as  they  were  during  the  VVU 
operational  test.  The  statistics  for  these  tests  were  those  derived 
from  the  results  stored  for  every  access  attempt.  The  information  re¬ 
corded  for  every  entry  attempt  consisted  of  data  such  as  times  and  va¬ 
lues  of  the  scanning  errors  at  the  selected  reference  point  location, 
expected  values  of  scanning  errors  for  the  user  for  that  trial,  trial 
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number,  etc. 


In  addition  to  the  data  used  during  the  normal  operation  of  the 
booth,  two  sets  of  data  were  collected  on-line  in  the  CIC  entry  control 
booth  for  performing  off-line  Type  II  trials  against  the  actual  CIC 
reference  files.  The  data  collected  were  the  preprocessed  data  (fil¬ 
tered,  regressed,  normalized,  and  quantized)  from  specially  collected 
sessions  of  12  phrases  each  (three  repetitions  of  four  phrases).  Since 
the  CIC  system  was  running  with  both  types  of  preprocessings  (BISS  type 
and  DSG  or  IPM0D2  type) ,  data  were  collected  from  both  types  of  users. 
The  number  of  impostor  sessions  were  40  (all  males)  for  the  BISS-type 
preprocessing  and  32  (23  males,  9  females)  for  the  DSG-type  (IPMOD2) 
preprocessing.  The  number  of  references  varied,  depending  on  the  number 
available  when  a  particular  experiment  was  performed. 

2.  SYSTEM  MODIFICATION  LOG 

The  log  in  Table  25  is  given  to  correlate  the  modifications  to  the 
system  with  any  changes  in  performance.  When  the  DSG-type  preprocessing 
was  installed,  a  common  verification  algorithm  was  used,  but  two  separ¬ 
ate  preprocessing  routines  and  some  parameters  were  maintained.  Also, 
after  this  time,  all  enrollments  were  performed  with  the  new  preprocess¬ 
ing;  hence  the  performance  for  the  BISS-type  preprocessing  would  natur¬ 
ally  improve  due  to  the  increasing  maturity  of  the  user  population  when 
no  new  users  were  being  enrolled. 

Note  that  references  to  different  decision  functions,  different  en¬ 
ergy  definitions,  and  various  sets  of  parameters  may  be  resolved  by 
referring  to  the  description  of  the  algorithm  modifications  in  Section 
IV  of  this  report. 

3.  RESULTS  OF  ON-LINE  OPERATIONAL  TESTING 

The  results  of  both  the  on-line  testing  with  the  operational  CIC 
system  and  the  off-line  impostor  testing  are  shown  in  Tables  26  and  27, 
for  the  BISS  and  DSG  (IPMOD2)  preprocessings,  respectively.  In  both 
tables  the  test  conditions  are  given  at  the  top  with  both  true  speaker 
and  impostor  test  results  for  a  given  set  of  conditions  given  below. 
Notable  in  these  tables  is  the  lack  of  performance  improvement  as  the 
result  of  any  of  the  parametric  modifications.  The  only  significant 
change  for  the  true  speaker  results  is  the  shift  of  some  of  the  veri- 
fieds  from  taking  one  phrase  to  taking  two  phrases  when  the  one-phrase, 
normal  mode  decision  threshold  is  changed  from  105  to  100  (changing  from 
decision  threshold  set  2  to  set  3) .  The  impostor  testing  results, 
however,  show  that  the  impostor  acceptance  rate  when  using  the  DSG 
(IPMOD2)  preprocessing  is  about  half  of  the  impostor  acceptance  rate 
when  using  the  BISS  preprocessing.  There  is  probably  even  more  differ¬ 
ence  than  shown  in  the  tables  of  results,  because  the  BISS  preprocessing 
Type  II  results  are  for  only  males,  and  the  results  would  be  worse  if 
both  males  and  females  were  used  in  the  Type  II  tests. 

Finally,  the  impostor  data  using  just  the  single-phrase  average 
scanning  error  were  combined  with  single-phrase  average  scanning  errors 
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collected  from  the  actual  CIC  system,  and  was  used  to  plot  Type  I/Type 
II  curves  for  a  one-phrase  strategy  for  both  BISS  and  IPM0D2  types  of 
preprocessing.  To  obtain  plots  with  comparable  abscissas,  the  IPM0D2 
average  scanning  errors  were  multiplied  by  "R,"  where  "R"  is  the  ratio 
of  the  average  true  speaker  scanning  errors- for  IPM0D2  preprocessing  to 
that  for  BISS  preprocessing.  A  plot  of  the  average  scanning  errors  for 
single  phrases  are  shown  in  Figure  20,  where  "R"  is  1.25. 

TABLE  25.  LOG  OF  CHANGES  TO  THE  980-BASED  CIC  SYSTEM 

Date  BISS  PREPROCESSING  DSG  PREPROCESSING 

spring  77  7-phrase  dec.  strat.  installed 

11/  /77  Statistics  collection  begun 

7/12/78  Set  1  decn  threshld  installed 

wk. 10/16/78  1.  Max  VPE  from  600  to  400 

2.  proc.  cut-off  delta  from  5 
to  20.  (only  suggested??) 

11/18/78  Set  2  dec.  thrshlds  installed 

03/13/79  D'  installed  DSG  prepro.  installed  with 

D'  &  max  ehat  after  enroll 
=  1.2  *  BISS  parameters. 
Set  2  dec  thrshlds 
Set  2  emax/emin's 

Energy  def.  2 
Energy  PVR  from  10  to  20 
Min  EOS  Ener.  frm  70  to  40 
No.  EOS  Smpls  frm  50  to  70 
Max  VPE  frm  400  to  480 

wk. 06/03/79  Max.  No.  same  phr  reprompts  Max.  No.  same  phr  reprompts 
for  PE  &  Norm  from  2  to  1  for  PE  &  Norm  from  2  to  1 

wk. 07/09/79  Normal  1  phrase  dec  thrsh.  Normal  1  phrase  dec  thrsh. 

from  105  to  100  from  105  to  100 

Set  3  emax/emin's 
max  ehat  after  enroll  from 
192  to  200 

Max  VPE  from  480  to  500 


04/05/79  Energy  def.  2 

Energy  PVR  from  10  to  20 
Min  EOS  Energy  from  70  to  40 
No.  EOS  Samples  from  50  to  70 


05/18/80 


(980  ASV  system  removed  from  CIC) 


TABLE  26.  ON-LINE  TRUE  SPEAKER  AND  OFF-LINE  IMPOSTOR  RESULTS  FOR  CIC 
ENTRY  CONTROL  BOOTH  USING  BISS-TYPE  PREPROCESSING 


Test  Conditions: 

Dec' n  Fen:  D 

D 

D  — 

- >  D' 

D' 

D' 

D' 

Max  VPE:  600  - 

— >  400 

400 

400 

400 

400 

400 

Dec'n  Thres:  Set  1 

Set  1- 

— >Set  2 

Set  2 

Set  2 

Set  2- 

— >Set  3 

Energy  Def:  Def  1 

Def  1 

Def  1 

Def  1 

- >  Def  2 

Def  2 

Def  2 

Energy  PVR:  10 

10 

10 

10 

- >  20 

20 

20 

Min  EOS  En:  70 

70 

70 

70 

- >  40 

40 

40 

#  Smpls  EOS:  50 

50 

50 

50 

- >  70 

70 

70 

Max  #  Same 

Phr  Reprompts: 

PE:  2 

2 

2 

2 

2 

2 

2 

PPE/Norm:  2 

2 

2 

2 

2  - 

- >  1 

1 

Enrollments:  yes 

yes 

yes 

- >  no 

no 

no 

no 

On-line  True  Speake 
Srt  Date:  07/18/78 

r  Trials: 
10/24/78 

11/21/78 

03/13/79 

04/07/79 

06/05/79 

07/15/79 

End  Date:  10/24/78 

11/21/78 

03/12/79 

04/04/79 

06/0  5/79 

07/10/79 

05/18/80 

#  Trials:  52,419 

14 ,458 

48 , 7  59 

9,938 

23,360 

14 ,851 

91,266 

Not  Verf'd:  0.51% 

0.59% 

0.67% 

0.78% 

0.62% 

0.45% 

0.35% 

(265) 

(86) 

(328) 

(78) 

(145) 

(67) 

(316) 

No  Respnse:  0.20% 

0.10% 

0.25% 

0.16% 

0.28% 

0.11% 

0.10% 

(104) 

(14) 

(122) 

(16) 

(66) 

(16) 

(93) 

%  Verf'd  on 

phrase  1:  74.35% 

72.55% 

71.80% 

74.14% 

74.13% 

75.52% 

70.21% 

2:  18.82 

19.30 

20.74 

19.24 

19.52 

18.78 

23.77 

3:  4.05 

4. 50 

4.55 

3.80 

3.96 

3.64 

3.86 

4:  1.75 

1.66 

1.80 

1.80 

1.43 

1.28 

1.34 

5 :  0.59 

0.86 

0.69 

0.58 

0.55 

0.42 

0.49 

6:  0.31 

0.32 

0.26 

0.2  8 

0.28 

0.25 

0.21 

7:  0.13 

0.12 

0.16 

0.16 

0.13 

0.11 

0.11 

Off-line  Impostor  Trials 
Test  Date:  09/18/78 

(10320  trials)  : 

11/13/78  02/05/79 

Max  VPE: 

600 

600 

600 

400 

Imp.  Accepts: 

Overall:  1 

.30% 

1.38% 

1.28% 

1.25% 

(134) 

(142) 

(132) 

(129) 

Normal  Mode: 

1  Phrasse: 

36 

36 

33 

33 

2  Phrases: 

22 

22 

23 

22 

3  Phrases: 

14 

14 

16 

14 

4  Phrases: 

15 

23 

21 

17 

Prej.  Mode: 

1  Phrase: 

10 

9 

2 

2 

2  Phrases: 

11 

12 

14 

18 

3  Phrases: 

12 

10 

6 

6 

4  Phrases: 

14 

16 

17 

17 
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TABLE  27.  ON- 

-LINE  TRUE 

SPEAKER 

AND  OFF-LINE  IMPOSTOR  RESULTS 

FOR  CIC 

ENTRY  CONTROL 

BOOTH  USING  IPMOD2 

-TYPE  PREPROCESSING 

Test  Conditions: 

Dec'n  Function: 

D' 

D' 

D' 

D' 

D' 

Max  VPE  (TS  trials): 

400  - 

—  >  480 

480  - 

—  >  500 

Max  VPE  (IM  trials) : 

480 

? 

7 

Dec'n  Thres: 

Set  2 

Set  2 

Set  2- 

— >Set  3 

Set  3 

Energy  Def 

Def  1- 

—  >  Def  2 

Def  2 

Def  2 

Def  2 

Emax/Emin: 

Set  1 

Set  1 

Set  1- 

— >Set  2 

Set  2 

Energy  PVR: 

10  - 

—  >  20 

20 

20 

20 

Min  EOS  En: 

70  - 

-->  40 

40 

40 

40 

#  Smpls  EOS : 

50  - 

—  >  70 

70 

70 

70 

Max  #  Same 

Phr  Reprompts: 

PE: 

2 

2 

2 

2 

2 

PPE/Norm: 

2 

2  — 

- >  1 

1 

1 

Enrollments: 

yes 

yes 

yes 

yes 

#  male/female  IM's: 

16/7 

16/7  — 

>  23/9 

#  male/female  refs: 

50/9 

50/9 

7 

On-line  True  Speaker 

Trials: 

Srt  Date: 

03/13/79 

04/07/79 

06/05/79 

07/15/79 

End  Date: 

04/04/79 

06/05/79 

07/10/79 

05/18/80 

#  Trials: 

191 

2,902 

3,288 

44,406 

Not  Verf'd: 

3.14% 

0.69% 

0.49% 

0.70% 

(6) 

(20) 

(16) 

(312) 

No  Respnse 

0.16% 

0.2  8% 

0.11% 

0.10% 

(16) 

(66) 

(16) 

(93) 

%  Verf'd  on 

phrase  1 

57.38% 

57.38% 

66.17% 

59.80% 

2 

21.86 

25.95 

23.52 

30.04 

3 

4.92 

7.52 

5.3  0 

5.84 

4 

11.48 

6.11 

3.09 

2.77 

5 

1.64 

1.63 

1  . 1  9 

0.97 

6 

1.64 

1.11 

0.55 

5.44 

7 

1.09 

0.28 

0.18 

0.15 

Off-line  Impostor  Trials  (10320  trials) : 


ist  Date: 

07/10/79 

07/10/79 

08/08/79 

Trials: 

8  57 

8  57 

1,223 

ip.  Accepts: 

Overall: 

1.28% 

0.82% 

0.65% 

(ID 

(7) 

(8) 

Normal  Mode: 

1  Phrase: 

6 

3 

3 

2  Phrases: 

1 

1 

1 

3  Phrases: 

0 

0 

0 

4  Phrases: 

0 

0 

0 

Pre j .  Mode: 

1  Phrase: 

1 

0 

0 

2  Phrases: 

0 

1 

2 

3  Phrases: 

0 

0 

0 

4  Phrases: 

3 

2 

2 
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SECTION  VI 


TEST  OF  THE  VOICE  VERIFICATION  UPGRADE  SYSTEM 

Enrollment  of  users  on  the  Voice  Verification  Upgrade  (WU)  system 
installed  to  control  one  of  the  entries  to  the  Semiconductor  Building  at 
Texas  Instruments  in  Dallas,  began  on  12  December  1980.  Operation  on  a 
24-hour  per  day  basis  began  on  2  January  1981  and  continued  through  17 
June  1981.  New  users  were  enrolled  on  the  system  throughout  its  opera¬ 
tion,  for  a  total  of  286  users  (200  men,  86  women).  These  users  pro¬ 
duced  13,539  trials  (presumably  all  "true-speaker "  trials),  with  a  true 
speaker  rejection  rate  of  2.26%  overall  and  1.04%  during  the  normal  mode 
of  operation  (i.e.,  more  than  11  prior  verifications). 

During  normal  usage  of  the  system,  updated  reference  files  were  se¬ 
lectively  captured  for  later  use  in  casual  impostor  trials.  The  refer¬ 
ence  files  produced  after  enrollment  and  after  1,2,4,8,16,32,64,  etc. 
previous  verifications  were  the  reference  files  that  were  saved.  During 
the  end  of  May  1981,  one  special  impostor  session  was  collected  from 
each  of  75  users  (51  men,  24  women)  who  volunteered.  Each  impostor  ses¬ 
sion  was  collected  in  the  same  entry  control  booth  with  the  same  type  of 
protocol.  The  only  difference  was  that  twelve  phrases  (three  repeti¬ 
tions  of  a  set  of  four  phrases)  were  required  rather  than  the  usual  one 
or  two  phrases.  The  data  were  preprocessed  and  sent  from  the  terminal 
processor  to  the  host  processor,  where  it  was  stored  on  disk. 

The  next  two  subsections  describe  the  results  of  the  normal  booth 
usage  (true  speaker  rejection  rates)  and  the  casual  impostor  testing  re¬ 
sults. 


A.  "TRUE  SPEAKER"  TESTING 

The  "true  speaker"  (assuming  no  casual  impostor  attempts  by  the 
users)  testing  was  performed  using  the  WU  system  in  an  operational  en¬ 
vironment  controlling  an  entrance  to  the  Semiconductor  Building  at  Texas 
Instruments.  Since  there  were  other  entrances  to  this  building  con¬ 
trolled  either  by  guard  or  by  closed-circuit  television,  all  users  en¬ 
rolled  on  the  voice  system  had  the  option  of  using,  other  entrances; 
however,  the  placement  of  the  voice  controlled  entrance  was  chosen  to 
provide  the  benefit  of  convenience  to  many  of  those  using  the  entrance. 
Although  enrollments  began  on  the  system  on  12  December  1980,  verifica¬ 
tions  on  a  24-hour  per  day  basis  did  not  begin  until  2  January  1981. 
The  system  remained  operational  except  for  occasional  maintenance  until 
its  delivery  to  RADC  on  17  June  1981. 

The  true  speaker  testing  yielded  a  higher  percentage  of 
not-ver if ieds  than  was  expected.  However,  all  users  of  the  system  were 
inexperienced  in  the  use  of  such  a  system,  with  no  large  population  of 
"veteran"  users  such  as  existed  when  data  were  first  collected  either 
from  the  old  980-based  CIC  system  or  from  the  newer  990-based  CIC  sys¬ 
tem.  (Most  users  of  the  n.ewer  system  had  been  enrolled  on  the  old  CIC 
system.)  The  gross  rejection  rate  from  2  January  1981  through  17  June 
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1981  was  2.26%  (306  rejections  out  of  13,539  total  trials)  by  247  users 
(194  males,  64  females) .  (39  of  the  286  users  never  used  the  system 
after  enrolling.)  The  improved  performance  as  a  function  of  increasing 
experience  of  the  user  population  can  be  seen  by  comparing  the  gross  re¬ 
jection  rate  of  0.69%  (5  rejections  out  of  719  total  trials)  for  the 
last  two  weeks  of  operation  of  the  booth,  to  the  2.26%  for  the  entire 
testing  period. 


the 


Typically,  eighteen  (7.3%)  of  the  users  accounted  for  156  (51%)  of 
errors.  Table  28  shows  the  distribution  of  the  number  of 


not-ver if ieds  across 

the  user  population. 

TABLE  28. 

NOT  VERIFIEDS  ACROSS  THE  USER 

POPULATION 

#  OF  N.V. 

#  OF  USERS  # 

OF  N.V. 

#  OF  USERS 

0 

144 

8 

2 

1 

43 

9 

5 

2 

25 

10 

1 

3 

11 

11 

0 

4 

6 

12 

2 

5 

4 

13 

0 

6 

1 

14 

2 

7 

1 

The  importance 

of  the  reenrollment  capability 

is  demonstrated  by 

the  dramatic  performance  improvement  shown  in 

Table  29  for  the  eight 

users  who  were  reenrolled. 

TABLE  29.  PERFORMANCE  IMPROVEMENT  FOR  ETCH'’’  REENROLLED  SPEAKERS 

BEFORE  REENROLLMENT 

AFTER 

REENROLLMENT 

USER 

#N. V./#USES 

#N 

.  V.  ,/#USES 

209230 

8/8 

4/140 

598247 

3/4 

0/3 1 

674419 

8/43 

2/3  3 

1762517 

3/3 

0  /3 

3452604 

13/16 

1  /3 

9540423 

2/6 

0/24 

12462163 

9/9 

0/29 

14020636 

4/5 

5  /60 

TOTALS 

50/94  (53.2%) 

12/323  (3.7%) 

When  the  statistics 

before  reenrollment  for 

these 

eiqht  users  are  ex- 

eluded,  the  overall 

rejection  rate  becomes 

1.95% 

(  25  6  rejections  n,:t  -t 

13  ,139  uses) . 


Of  special  interest  is  the  performance  as  a  • u: 
of  prior  verifications,  which  is  shown  in  TaMo  ',o 
WU,  990-based  systems.  The  rejection  rates  shown 
16.15%  (12.62%  excluding  data  prior  to  r  <• 1  ■ 

"post-enrollment"  (0-3  prior  ver i f i cat  ions'  r>e* 
"post-post-enrollment"  (4-11  prior  verified'  -■ 


•t  i,- 
fnr 

i  >  y- 


r  i 


F /#  17/2 


AD-A118  HW 
UNCLASSIFIED 


TEXAS  INSTRUMENTS  INC  DALLAS 
VOICE  VERIFICATION  UP0RADE.(U> 

JUN  82  R  L  DAVIS*  J  T  SINN A MON,  D  L  COX 
TI-08-82-07  &A0C-TR-82-139 


F30602-79-C-0190 


(0.96%  excluding  data  prior  to  reenrollments)  for  the  "normal"  (more 
than  11  prior  verifications)  period. 

TABLE  30.  PERFORMANCE  AS  A  FUNCTION  OF  NUMBER  OF  PRIOR 
VERIFICATIONS  FOR  THE  990-BASED  SYSTEMS 


CIC  (since 
#  tr ials 

Verified:  43,418 

Not-verif ied :  193 

Reject  rate  vs  £ 
of  prior  sessions: 

#  tr ials  #  Errors 


0-  3 

371 

4-  11 

770 

12-  27 

1,891 

28-  59 

3,699 

60-  123 

6,521 

124-  251 

6,498 

252-  507 

11,246 

508-1019 

12,323 

1020-2043 

283 

2044-4091 

0 

4092-8187 

0 

>8187 

0 

3/11/80)  WO  (since  1/2/81) 


% 

#  trials 

% 

99.56. 

13,233 

97.74 

0.44 

306 

2.26 

%  Error 

#  trials  f 

Errors 

%  Error 

3.77 

1,065 

17  2 

16.15 

0.52 

1,492 

22 

1.47 

0.69 

2,504 

41 

1.64 

0.54 

3,712 

26 

0.70 

0.25 

3,949 

37 

0.94 

0.51 

814 

8 

0.98 

0.45 

0 

0 

— 

0.39 

0 

0 

— 

0.00 

0 

0 

— 

0  0 
0  0 
0  0 


Also  of  interest  is  the  number  of  retries  for  users  who  are  not 
verified.  The  306  not-verif ieds  occurred  during  195  sessions.  This 
means  that  although  the  overall  reject  rate  was  2.26%  per  trial,  the 
overall  reject  rate  on  a  booth  passage  basis,  which  is  the  real  number 
of  interest  for  an  entry  control  system,  was  only  0.63%  (84  rejections 
out  of  13,317  access  attempts).  Although  the  users  were  instructed  to 
retry  once  after  their  first  rejection,  but  to  quit  and  call  for  assis¬ 
tance  (an  override)  after  two  rejections.  Table  31  shows  that  a  signifi¬ 
cant  number  of  the  users  did  not  follow  instructions.  Since  there  were 
only  306  not-verif ied' s  during  the  testing,  those  users  who  tried  sever¬ 
al  times  in  succession  added  a  significant  negative  bias  to  the  overall 
user  acceptance  rate.  In  addition,  the  retry  capability  also  affords 
the  opportunity  for  casual  impostor  attempts  by  curious  users,  also  ad¬ 
versely  affecting  the  user  acceptance  rate.  However,  the  necessity  of 
at  least  a  limited  retry  capability  can  be  seen  by  noting  that  71  per¬ 
cent  of  the  users  who  tried  more  than  once  were  finally  verified  (60 
percent  on  the  second  try) ,  which  as  explained  above,  improves  the  ac¬ 
ceptance  rate  on  a  booth  passage  basis.  Table  31  shows  the  access  re¬ 
ject  rate  ([number  who  retry  +  cumulative  number  who  quit]  /  [number  of 
access  attempts] )  when  the  number  of  verification  attempts  is  limited  to 
a  specified  number  of  trials.  The  data  in  this  table  suggests  that  it 
might  be  prudent  to  limit  the  number  of  trials  per  user  I.D.,  per  pas¬ 
sage,  to  two  or  three. 
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TABLE  31.  ACCESS  REJECT  RATE  (13,317  ACCESSES)  AS  A  FUNCTION  OF 

NUMBER  OF  RETRIES 


REJECT 


VERIF 

#  OF 

CUM. 

CUM. 

RATE  IF 

ATTEMPT 

VERIF 

NUMBER 

NUMBER 

*  NOT 

#  WHO 

t  WHO 

#  WHO 

ATTMPTS 

NUMBER 

ATTEMPTS 

VER'D 

VER'D 

VER'D 

RETRY 

QUIT 

QUIT 

LIMITED 

1 

13,317 

13,122 

13,122 

195 

157 

38 

38 

1.46% 

2 

157 

94 

13,216 

63 

35 

28 

66 

0.76% 

3 

35 

9 

13,225 

26 

15 

11 

77 

0.69% 

4 

15 

5 

13,230 

10 

8 

2 

79 

0.65% 

5 

8 

2 

13,232 

6 

3 

3 

82 

0.64% 

6 

3 

- 

13,232 

3 

2 

1 

83 

0.64% 

7 

2 

1 

13,233 

1 

1 

- 

83 

0.63% 

8 

1 

- 

13,233 

1 

1 

- 

83 

0.63% 

9 

1 

- 

13,233 

1 

0 

1 

84 

0.63% 

Although  these  performance  statistics  are  biased  by  the  large 
number  of  new  users  (lack  of  any  long-term  users),  the  performance  was 
still  not  as  good  as  expected,  based  on  prior  results  from  both  the 
980-based  and  the  990-based  systems  at  CIC.  The  reasons  for  the  poorer 
performance  are  judged  to  be: 

1)  The  poor  quality  of  the  LPC  prompting  phrases  (Although  this 
was  also  a  problem  on  the  new  CIC  system,  a  majority  of  the 
users  had  been  enrolled  previously  on  the  old  980  system, 
which  used  high  quality  PCM  prompting  phrases,  and  hence, 
were  familiar  with  the  word  set) . 

2)  The  extended  time  (2-3  weeks)  between  enrollment  and  initial 
use  by  the  first  speakers  to  be  enrolled  (due  to  delays  in 
completion  of  all  of  the  software) . 

3)  Getting  the  subjects  to  stand  close  enough  and  speak  up 
loudly  enough  (Again,  many  of  the  users  of  the  newer  CIC 
system  were  experienced  due  to  their  use  of  the  old  980 
system) . 

4)  Poor  enrollments  (This  is  felt  to  be  due  both  to  the  poor 
quality  prompting  and  the  newness  of  the  task) . 

Although  the  performance  statistics  were  generally  not  broken  down 
by  sex  for  the  WU  system,  a  few  of  the  overall  statistics  were  calcu¬ 
lated  and  are  shown  in  Table  32. 
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TABLE  32.  WU  PERFORMANCE  BY  SEX 


MALES 

FEMALES 

Number  of  users: 

(with  one  or  more 

uses) 

IT? 

- w 

Number  of  uses: 

total : 

10,271 

3,268 

ave.  per  user: 

56 

51 

Not-verif ied"s 

number: 

154 

152 

percent: 

1.50% 

4.65% 

Ave.  Ehat  at  End  of 

Test : 

97 

134 

A  summary  of  the  verification  performance  as  a  function  of  several 
other  parameters  is  shown  in  Table  33.  Of  special  note  are  that  1)  the 
average  number  of  phrases  for  verification  was  1.76,  2)  the  reject  rate 
is  higher  with  more  than  one  person  in  the  booth  (3.3%  vs.  2.0%), 
3)  the  reject  rate  decreases  with  usage,  and  4)  the  reject  rate  incre¬ 
ases  as  the  expected  "error"  (data  variance)  for  the  speaker  increases. 


B.  "CASUAL"  IMPOSTOR  TESTING 

The  "casual"  impostor  testing  was  performed  by  using  specially  col¬ 
lected  impostor  sessions  from  users  enrolled  on  the  system  against  user 
reference  files  that  had  been  stored  during  normal  use  of  the  system. 
The  impostor  files  consisted  of  preprocessed  (regressed,  normalized  and 
quantized)  filter  bank  data  for  three  repetitions  of  a  set  of  four 
phrases.  The  data  were  collected  on-line  in  the  WU  booth  by  soliciting 
a  voluntary  impostor  session  from  users  during  their  normal  use  of  the 
booth.  Data  for  a  total  of  75  impostors  (51  men,  24  women)  were  col¬ 
lected. 

The  reference  data  were  collected  automatically  during  normal  booth 
usage  by  storinq  updated  reference  files  after  0,  1,  2,  4,  0,  16,  etc. 
previous  verifications.  Due  to  the  quantity,  these  data  were  stored 
on  the  secondary  disc  drive,  which  on  occasion  was  not  on-line.  Since 
the  reference  file  data  collection  was  peripheral  to  the  actual  booth 
operation,  the  WU  system  would  not  cease  operation  if  the  secondary 
storage  medium  was  off-line.  Hence,  occasionally  (1%  of  the  time) 
reference  files  were  missed  for  some  transactions.  In  addition,  since 
some  users  were  reenrolled,  duplicate  reference  files  for  the  same 
number  of  prior  verifications  were  collected. 

The  actual  impostor  testing  performed  was  to  compare  all  impostors 
against  all  other  users  (of  the  same  sex)  for  which  reference  files 
after  32  prior  verifications  existed.  Of  the  286  users  (200  men,  86 
women),  only  116  of  them  (93  men,  33  women)  completed  32  or  more  verifi¬ 
cations.  Impostor  trials  were  also  run  against  this  set  of  users  for 
their  reference  files  immediately  after  enrollment.  Since  eight  of  the 
51  male  impostors  and  four  of  the  24  female  impostors  had  not  completed 
32  sessions,  this  yielded  4,700  (43*92  +  8*93)  independent  trials  for 
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Table  n(a) 


Table  33 

Verification  Performance  Summary  (Type  1  Analyela)  for  VVU  ASV  Syatem 
DATA  F0»  1  /  2  /  1981  AT  2043  THRU  6  /  17  /  1981  AT  UJ7 


*  VERIFIED  s  13233  97.74X 

NOT  VERIFIED  «  306  2.26X  j  306  2.26X 

TOTAL  3  PHRASES  IN  VERIFICATION  a  23313  > 

TOTAL  *  PHRASES  OVERALL  *  25343  > 

TOTAL  »  PHRASES  NOT  REGISTERED  a  1501  | 


NV|  0  O.OOX.  NO  RESPONSE 
AVG  FOR  VERIFICATION  a  1.76 

AVG  OVERALL  a  1.87 

(  5.92X  ) 


******AlL****** 
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the  men  and  772  (20*32  +  4*33)  for  the  women  for  the  after  32-sea8ion 
reference  files  and  somewhat  less  (as  explained  earlier)  for  the  refer¬ 
ence  files  immediately  after  enrollment.  Hence,  input  from  one  speaker 
was  tested  against  no  more  than  two  reference  files  for  the  same  user. 

The  test  against  the  reference  files  after  32  prior  sessions  used 
the  parameters  for  the  "normal”  mode  of  verification  and  the  test 
against  the  reference  files  after  enrollment  used  the  parameters  for 
the  "post-enrollment"  mode  of  verification.  The  results  of  this  casual 
impostor  testing  are  given  in  Table  34.  Of  the  20  errors  against  en¬ 
rollment  reference  files  and  the  44  errors  against  32-session  reference 
files,  only  nine  of  the  errors  were  for  the  same  impostor /reference 
pairs  for  the  two  types  of  reference  files. 

Also  shown  in  Table  34  are  the  confidence  levels  that  the  true 
error  is  less  than  1.0%,  based  upon  the  error  rate  observed  in  these  ex¬ 
periments  and  assuming  independent  trials.  For  a  fixed  sample  size,  the 
confidence . level  decreases  as  the  observed  error  rate  approaches  the  de¬ 
sired  upper  bound  on  the  error  rate.  This  can  be  seen  more  graphically 
by  the  curve  in  Figure  21  that  shows  the  upper  limit  on  the  observed 
error  rate  as  a  function  of  sample  size  necessary  to  insure  a  90%  confi¬ 
dence  level  that  the  true  error  rate  is  less  than  1%.  Also,  for  a  fixed 
observed  error  rate,  the  confidence  level  increases  as  the  sample  size 
increases. 

TABLE  34.  CASUAL  IMPOSTOR  TESTING  RESULTS  AND  CONFIDENCE  LEVELS  THAT 
TRUE  ERROR  RATE  IS  <  1.0%  BASED  UPON  OBSERVED  ERROR  RATE 


REFERENCE  FILES 
AFTER 


MEN 

#VER/# TRIALS 


ENROLLMENT  19/4648  (0.41%) 

(Confidence  level)  (99.9+%) 

32  VERIFICATIONS  43/4700  (0.91%) 
(Confidence  level)  (69%) 


WOMEN 

#VER/» TRIALS 

1/795  (0.13%) 
(99%) 

1/772  (0.13%) 
(99%) 


OVERALL 
»VER/» TRIALS 

20/5443  (0.37%) 
(99.9+%) 

44/5472  (0.80%) 
(92%) 


The  distribution  of  successful  impostors  and  of  references  that 
were  successfully  impersonated  was  very  nonuniform  for  the  men.  The 
same  phenomenon  would  be  expected  for  the  women  with  a  larger  sample 
size.  Those  impostors  successful  against  many  users  are  known  as 
"wolves,"  and  those  references  who  are  successfully  impersonated  by  many 
impostors  are  known  as  "eels."  These  distributions  are  shown  in 
Table  35. 
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TABLE  35.  DISTRIBUTION  OF  IMPOSTOR  SUCCESSES 
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The  existence  of 

fixed 

input  data 

to 

compare 

against 

variable 

reference  files  also 

allowed 

1  the  observation 

that  the 

adaptation  of  the 

reference  files,  and  not  just  the  adaptation  of  the  speaker,  decreased 
the  number  of  phrases  needed  to  verify  (i.e.  improved  the  performance). 
The  average  number  of  phrases  to  verify  as  a  function  of  the  number  of 
prior  verifications  is  given  below  for  all  successful  verifications  for 
all  users.  (For  this  experiment,  the  restriction  was  removed  that  re¬ 
quired  at  least  four  phrases  be  prompted  during  the  "post-enrollment" 
period. ) 

#  OF  PRIOR 

VERIFICATIONS:  0  1  2  4  8  16  32  64  128 

AVERAGE  #  OF 

PHRASES  TO  VERIFY:  3.16  3.01  2.83  2.68  2.60  1.97  1.77  1.60  1.27 


SECTION  VII 


RESIDUAL  ENERGY  BASED  SPEAKER  VERIFICATION 


A  variety  of  signal  processing  algorithms  exist  for  representing 
the  speech  signal  in  terms  of  time  varying  parameters.  These  parameters 
are  used  in  speech/speaker  recognition  for  determining  a  measure  of  sim¬ 
ilarity.  Examples  of  these  signal  processing  transformations  are  the 
direct  spectral  measurement  which  uses  either  a  bank  of  bandpass  filters 
or  a  discrete  Fourier  transform,  the  cepstrum,  or  a  set  of  suitable  par¬ 
ameters  of  a  linear  predictive  model.  Selection  of  the  parameter  set 
depends  to  a  considerable  degree  on  performance  and  implementation  con¬ 
siderations.  It  is  generally  agreed  that  the  linear  modeling  techniques 
have  performance  comparable  to  or  better  than  other  techniques.  Results 
summarized  in  an  RADC  report  [14]  indicate  that  the  recognition  error 
obtained  for  a  speaker  dependent  word  recognition  system  using  the 
LPC-based  recognition  techniques  is  a  factor  of  three  times  lower  than 
the  recognition  error  using  the  bandpass  filtering  technique.  The  line¬ 
ar  predictive  coding  (LPC)  based  approach  to  speaker  verification  was 
chosen  for  investigation  during  this  contract  because  of  its  performance 
and  its  ease  of  implementation  in  a  system  using  a  single-chip,  digital 
signal  processor ,  designed  to  provide  high-speed  multiply-accumulate  op¬ 
erations. 

A.  SIMILARITY  MEASURE 

Once  the  parameters  have  been  extracted  from  the  speech  waveform,  a 
similarity  measure  must  be  computed  between  those  parameters  and  a 
stored  reference.  The  similarity  measure  used  was  patterned  after  that 
of  Itakura,  [15]  which  uses  a  normalized  prediction  residual.  The  LPC 
prediction  residual  energy  is  measured  by  passing  the  input  speech  sig¬ 
nal  (for  the  frame  in  question)  through  an  all-zero  inverse  filter  re¬ 
presenting  the  reference  data.  If  the  reference  data  match  the  input 
data,  then  the  spectral  notches  in  the  inverse  filter  will  match  the 
spectral  peaks  in  the  input  signal  and  a  low  energy  residual  output  will 
result.  This  residual  energy  is  normalized  to  a  value  greater  than  one, 
by  dividing  by  the  residual  energy  which  results  when  the  inverse  filter 
is  optimally  matched  to  the  input  data. 

The  prediction  residual  is  computed  easily  as  the  inner  product  of 
the  autocorrelation  function  of  the  input  with  the  autocorrelation  func¬ 
tion  of  the  inverse  filter,  as  will  be  shown  later.  Normalization  by 
the  residual  of  the  input  signal  is  not  so  simple.  In  essence  the  auto¬ 
correlation  matrix  must  be  inverted,  and  the  traditional  method  of 
choice  is  Levinson's  algorithm.  An  improvement  on  this  algorithm  which 
limits  intermediate  computations  to  a  magnitude  less  than  1  was  demon¬ 
strated  by  LeRoux  and  Geuguen.[16] 

B.  MODIFICATIONS  TO  THE  INPUT  AUTOCORRELATION  FUNCTION 

Initial  experiments  indicated  the  desirability  of  modifying  the 
similarity  measure.  These  modifications,  however,  were  actually  imple- 


mented  by  adding  some  preprocessing  to  the  input  autocorrelation  se¬ 
quence  . 

First,  an  inherent  problem  in  using  inverse  filtering  techniques 
for  comparing  speech  data  arises  when  substantial  modeling  of  very  low 
amplitude  parts  of  the  spectrum  takes  place.  Since  an  inverse  filter 
can  exhibit  an  extremely  high  gain  at  certain  frequencies  (especially  at 
high  frequencies),  imperceptibly  small  noise  signals  in  the  input  at 
these  frequencies  can  strongly  dominate  the  residual  energy  output  from 
the  inverse  filter. 

A  solution  to  this  problem  is  to  add  a  "noise  floor"  to  the  speech 
signal  to  prevent  very  high  gain  in  the  inverse  filter.  This  is  done  by 
adding  uncorrelated  white  noise  with  energy  proportional  to  the  input 
signal  energy.  This  is  implemented  by  multipying  the  frame  energy 
[r(0)l  by  some  factor  (1.1,  in  this  case)  and  leaving  all  other  auto¬ 
correlation  terms  unchanged.  Multiplying  by  1.1  corresponds  to  adding 
10%  noise  to  the  signal  and  reduces  the  range  in  spectral  amplitude  to 
about  20  dB.  This  algorithm  improves  the  performance  of  the  verifica¬ 
tion  system  in  a  noisy  environment  as  it  reduces  the  effect  of  the 
dissimilarity  between  the  lower  energy  portions  of  the  speech  which  have 
Been  corrupted  by  background  noise.  The  choice  of  the  level  for  the 
noise  floor  represents  a  compromise  between  the  system  performance  in 
low  and  high  background  noise  environments.  Figure  22  illustrates  the 
effect  of  multiplying  the  zeroth-lag  autocorrelation  term  [r  (0) )  by 
1.1  for  a  nineteenth-order  LPC  modeled  spectrum.  The  dashed  curve  is 
for  r (0)  increased  by  10%,  showing  the  reduction  in  dynamic  range  of  the 
speech  signal . 

The  second  modification  was  to  regress  out  gross  energy  trends 
versus  frequency,  similar  to  the  method  used  in  previous  filter  bank 
work  that  used  Legendre  polynomials  and  sine/cosine  basis  functions. 
The  method  used  here,  however,  was  to  filter  the  signal  with  its  own 
low-order  (first-order)  inverse  filter. 

The  side-effects  of  these  two  operations  are  (1)  the  reduction  of 
the  dynamic  range  of  the  spectral  amplitude  to  about  20  dB  and  (2)  the 
need  to  use  a  higher-order  LPC  analysis  to  accurately  model  the  formant 
locations.  (Instead  of  our  usual  14th-order  model,  a  21st-order  model 
was  used,  reducing  to  20th-order  after  regression.) 

C .  TIME  REGISTRATION 

In  order  to  compute  the  similarity  measure  between  the  incoming 
speech  and  stored  reference  patterns,  it  is  necessary  to  compensate  for 
changes  in  the  length  and  timing  of  the  input  utterance.  Two  methods  of 
time  registration  were  used  during  these  experiments. 

The  first  method  was  to  compare  "scanning"  patterns  formatted  from 
the  input  data  to  reference  scanning  patterns.  This  method  is  identical 
to  that  used  previously  for  filter  bank  parameters.  These  input  scan¬ 
ning  patterns  for  each  word  were  formed  by  using  the  decision  function 
for  six  frames,  spaced  20  ms  apart,  at  times  of  -50,  -30,  -10,  +10,  +30, 


and  +50  ms  around  each  input  sample  time.  A  scanning  pattern  centered 
about  frame  n  thus  consisted  of  the  frame  sequence  (n-5,  n-3,  n-1,  n+1, 
n+3 ,  n+5) .  The  reference  patterns  were  formed  at  the  same  time  inter¬ 
vals  around  the  energy  peak  in  each  of  the  reference  words.  A  scanning 
error  was  then  computed  between  each  input  and  reference  scanning  pat¬ 
tern.  In  other  words  in  this  method,  there  was  no  time  normalization 
for  each  individual  word;  however,  the  times  between  words  were  nonli- 
nearly  scaled. 

The  second  method  used  was  a  modification  of  the  basic  dynamic  pro¬ 
gramming  algorithm  used  by  Itakura,  which  performs  word  specific  time 
registration  so  as  to  maximize  the  similarity  of  input  and  reference 
data  for  each  reference  word  in  turn.  There  were  three  significant  mod¬ 
ifications  to  this  technique  as  used  in  this  experiment.  First,  and 
most  significant,  is  that  endpoints  are  unconstrained.  That  is,  there 
is  no  algorithm  that  arbitrarily  (or  otherwise)  constrains  the  dynamic 
optimization  routine  to  start  and  end  on  specific  input  frames. 
Processing  time  is  substantially  increased  by  elimination  of  these  con¬ 
straints.  However,  the  reliability  burden  on  endpoint  determination  no 
longer  exists. 

A  second  difference  is  that  the  reference  data  are  represented  only 
at  every  other  input  frame.  That  is,  the  frame  period  of  the  reference 
data  is  twice  the  frame  period  of  the  input  data.  Our  experience,  at 
least  for  word  recognition,  suggests  that  performance  is  relatively  in¬ 
sensitive  to  the  number  of  reference  frames  per  vocabulary  word  above  a 
threshold  level  of  about  20  frames  per  second.  This  frame  skipping  in 
the  reference  data  does  two  things.  First,  it  halves  the  amount  of 
reference  data  that  must  be  stored.  Second,  it  halves  the  number  of  dy¬ 
namic  programming  computations  that  must  be  performed. 

The  final  difference  is  that  penalty  errors  are  added  when  nonline¬ 
ar  warping  occurs.  Figure  23  gives  a  graphic  comparison  of  the  basic 
and  modified  dynamic  programming  techniques. 


D.  DERIVATION  OF  THE  RESIDUAL  ERROR  DECISION  FUNCTION 

The  frame-to-f rame  comparison  used  in  this  investigation  is  the  en¬ 
ergy  of  the  residual  signal  left  after  passing  an  input  through  a  linear 
predictive  inverse  filter,  normalized  by  the  energy  in  the  input  signal. 
The  decision  function  then  is  the  ratio  of  the  normalized  residual  ener¬ 
gy  obtained  when  the  unknown  input  is  passed  through  the  inverse  filter 
for  the  reference,  to  the  normalized  residual  energy  obtained  when  the 
unknown  input  is  passed  through  its  own  inverse  filter. 

In  general,  the  autocorrelation  (energy)  of  the  output  (y)  of  a 
filter  can  be  found*  from  the  autocorrelation  of  the  original  signal  by 

Ey  =  r  { i )  *  h(i)  *  h(-i) 


*  Equation  10-36  of  Athanasios  Papoulis,  Probability,  Random  Variables , 
and  Stochastic  Processes,  Reference  17. 
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Basic  Algorithm: 


E„  =  Dv,  .  +  min(E„  ,  .  ,E  .  , 

N,  J  N,j  N-l ,  J  N-l » J  —  1 


EN-l,j-2) 

where  E  •*-<*>  if  zero  lag  were  used 
to  produce  it. 


Figure  23  Comparison  of  Basic  and  Modified  Dynamic  Programming  Algorithms 

•  D  is  the  distance  measure  between  reference  frame  N  and  input  frame  j 

N,J 

•  E  is  the  optimum  (minimum)  subsequence  distance  up  through  reference 

N,  j 

frame  N,  given  that  input  frame  j  corresponds  to  reference  frame  N. 
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where  h(i)  is  the  impulse  response  of  the  (inverse)  filter,  r(i)  is  the 
autocorrelation  of  the  input  [in  our  case,  the  input  has  been  preempha¬ 
sized  [x(i)-x(i-l) ]  and  windowed  with  a  Hamming  window],  and  indi¬ 
cates  convolution.  However,  since  the  convolution,  h  < i )  *  h(-i),  is 
just  the  autocorrelation  [  (0)]  of  the  impulse  response,  Ey  is  just  the 
convolution  of  the  autocorrelation  of  the  input  with  the  autocorrelation 
of  the  impulse  response. 

A  more  detailed  derivation  of  the  expressions  for  these  residual 
energies  can  be  given  by  considering  passing  a  discrete-time  input  se¬ 
quence  {x(n)}  of  length  L,  through  an  inverse  filter  of  the  form 


M 

M 

A(z)  =  1  +  a  (i )  z 

=  ^  a(i)  z 

with  a(0)  =  1. 

i=l 

i=0 

The  output,  y(n),  of  this 

filter  is  x(n) 

+  a(l)x(n-l)  +  ... 

+ 

a(M)x(n-M).  The  total  energy  in  the  filtered  signal,  y,  (i.e.,  the  re¬ 
sidual  energy  or  the  energy  in  the  residual  signal)  is  given  by 


r  L 

M  -| 

2 

L 

r  M 

M 

Ey  = 

Z 

a  (i )  x  (n-i) 

a  (i )  x  (n-i ) 

yi  a  (k)  x  (n-k) 

-n=0 

i=0  J 

n=0 

*i=0 

k=0 

or  alternatively,  as 

MM  L  MM 

Ey  =  a  (i)a  (k)  x(n-i)x(n-k)  =  ^  a  (i )  a  (k)  r  (i-k)  , 

i=0  k=0  n=0  i=0  k=0 

where  r  (i-k)  is  the  autocorrelation  of  the  input  signal,  x.  When  the 
input  signal  is  the  same  as  that  used  to  derive  the  a(i)"s,* 

M 

y]  a(i)r(i-k)  =  -  r(k)  k  =  {l,2, - ,m}  , 

i=l 


*  Equation  3  of  J.D.  Markel,  "Digital  Inverse  Filtering  -  A  New  Too] 
for  Formant  Trajectory  Estimation,"  Reference  18. 
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in  which  case,  the  residual  energy  becomes 

M 

Ey'  =  ^  a(i)r (i)  . 
i-0 

Both  Ey  and  Ey'  can  then  be  normalized  by  dividing  by  r(0),  the  energy 
in  the  input  signal,  or  equivalently,  by  letting  r(i)  be  a  normalized 


autocorrelation: 

L-i 

/  L  , 

r(i)  =  x(n)x(n+i)  > 

(n)  . 

n=0  / 

n=0 

Note  that  Ey'  is  just  the  inner  product  of  two  vectors,  <  a*,  r*  >. 
some  manipulation,  Ey  can  be  written  as 


Ey  =  2 


£ 


"  M-i 

^~%(j)a(j  +  i) 

_  j=0 


r  (i) 


^  (j)r(O) 

j-0 


If  p  is  the  column  vector  (  P  (0) ,  p(l),...  p(M)),  where 


With 


M-i 

p (i)  =  ^  ^  a(j)a (j+i)  , 

j=0 

then  Ey  can  also  be  written  in  inner  product  form  as  2<  ^  ,  r*-  > 

P  (0)  r  (0)  . 

The  (log  of  the)  ratio  of  Ey  to  Ey'  was  proposed  by  Itakura  [15] 
for  use  as  a  decision  function  for  isolated  word  recognition  and  was 
used  by  Wakita  [19]  for  vowel  recognition. 

Wakita  points  out  that  Ey  can  represent  the  residual  energy  for  ei¬ 
ther  an  arbitrary  input  passed  through  the  inverse  filter  for  a  refer¬ 
ence,  or  a  reference  passed  through  the  inverse  filter  for  the  arbitrary 
input.  Ey'  is  the  residual  energy  for  any  input  passed  through  its  own 
inverse  filter.  The  decision  function  used  in  this  investigation  was  a 
modification  to  the  ratio  of  Ey  (an  arbitrary  input  passed  through  the 
inverse  filter  for  the  reference)  to  Ey'  (the  arbitrary  input  passed 
through  its  own  inverse  filter). 
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E.  THE  EXPERIMENTAL  DATA  SET 

The  data  set  used  during  all  of  the  early  experiments  was  the  same 
as  that  used  in  the  filter  bank  experiments  done  for  testing  the  changes 
made  to  the  upgraded  system.  This  set  was  collected  prior  to  this  con¬ 
tract  and  consisted  of  one  enrollment  session  (48  phrases)  and  21  verif¬ 
ication  sessions  (12  phrases  each)  from  11  speakers  (6  males,  5  fe¬ 
males)  . 

The  collection  of  data  from  additional  speakers  to  expand  the 
number  of  speakers  in  the  data  set  was  completed  during  this  contract. 
These  data  also  consisted  of  one  enrollment  session  and  21  verification 
sessions  from  24  additional  speakers  (12  males,  12  females),  including 
one  pair  of  identical  twins  (males).  With  the  eleven  original  speakers, 
there  are  data  from  18  males  and  17  females.  This  gives  data  for  12,138 
(18*17*21  +  17*16*21)  type  II  trials  and  735  (35*21)  true  speaker  tri¬ 
als.  All  the  data  were  preprocessed  to  produce  the  LPC  analysis  files 
used  in  the  residual  error  speaker  verification  experiments.  However, 
due  to  the  processing  time  required  for  this  experiment,  only  two  verif¬ 
ication  sessions  from  each  speaker  were  processed  for  impostor  trials. 
Also,  in  order  to  avoid  data  dependencies  between  the  two  twins,  only 
one  twin  was  used  in  the  experiments.  This  yielded  13,056 
(12*2*2*17*16)  single-phrase  type  II  trials. 


F.  FIXED  PATTERN  TIME  NORMALIZATION  EXPERIMENTS 


The  first  scheme  used  for  time-normalization  was  to  compare  "scan¬ 
ning"  patterns  formatted  from  the  input  data  to  reference  scanning  pat¬ 
terns.  These  input  scanning  patterns  were  formed  by  using  the  decision 
function  at  times  of  -50,  -30,  -10,  +10,  +30,  and  +50  ms  around  each 
input  sample  time.  The  reference  patterns  were  formed  at  the  same  time 
intervals  around  the  energy  peak  in  each  of  the  reference  words.  A 
scanning  error  was  then  computed  for  each  input  time  sample,  "j",  as 


SE(j) 


1/6 


L 


Ey  ( j+2k-7) 
Ey'  (j+2k— 7) 


-  1 


*  1000  , 


where  both  Ey  and  Ey'  were  calculated  using  the  modified  input  auto¬ 
correlations.  Ey  was  calculated  directly  as  explained  earlier; 
however,  Ey'  was  a  byproduct  of  a  general  subroutine  that  iteratively 
calculated  the  predictor  coefficients.  Note  that  since  the  usual  range 
of  Ey/Ey'  is  1.0  through  1.4,  (Ey/Ey')-1  is  fairly  close  to  the 
log(Ey/Ey')  used  by  Itakura.  The  effect  of  not  taking  the  log  is  to  pe¬ 
nalize  the  larger  deviations  more  severely. 


A  complete  speaker  verification  experiment  was  run  on  the 
11-speaker,  21-session  data  set  using  scanning  errors  (SE(j)'s).  The 
minima  of  the  scanning  errors  for  all  words  in  a  phrase  were  combined  to 
find  the  sequence  of  scanning  error  minima  having  the  smallest  total  se- 
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quence  error  according  to  the  following  equation: 


OPTSEQ  ERROR 


3 

(1  +  a  m(i))(l  +  a  m(i+l) 

i=l 


1- 

2~\ 

1  +  6 

dt  (i)  -  dthat(i) 

dthat(i) 

L  ,4 

where  m(i)  is  a  (local)  minima  of  SE(j)  for  the  word  in  position  i, 
dt (i)  *  t (i+1)  -  t(i)  ,  (t  (i+1)  must  be  >  t(i)), 
dthat(i)  is  the  expected  value  of  dt(i)  for  the  speaker, 
a*  0.01,  and 
6*  1.0. 


Plots  of  the  average  (e)  of  the  m(i)'s  in  the  sequence  having  the 
lowest  OPTSEQ  ERROR  for  both  true  speakers  and  impostors  for  the  residu¬ 
al  energy  measure  are  shown  in  Figure  24  for  males  and  Figure  25  for  fe¬ 
males.  Also  shown  in  Figures  24  and  25  are  the  comparable  curves  using 
SE ( j ) " s  derived  from  Euclidian  distances  between  input  and  reference 
spectral  patterns. 

Plots  of  the  normalized  errors  (e/ehat)  of  the  m(i)"s  for  the  resi¬ 
dual  energy  measure  are  shown  in  Figure  26  for  males  and  Figure  27  for 
females.  Again,  the  curves  derived  from  spectral  SE(j)'s  are  also  shown 
in  both  figures. 

There  are  several  important  characteristics  to  note  from  these  fig¬ 
ures: 

1.  Performance  using  the  residual  energy  measure  is  superior 
to  that  for  the  spectral  measure. 

2.  Performance  for  males  is  superior  to  that  for  females  for 
the  residual  energy  measure  for  both  e  and  e/ehat,  and  for 
the  spectral  measure  for  e. 

3.  Performance  using  normalized  errors  is  better  for  the 
spectral  measure  and  is  worse  for  the  residual  energy 
measure . 

4.  The  number  of  unregistered  phrases  was  lower  for  both  true 
speakers  and  impostors  using  the  residual  energy  measure 
rather  than  the  spectral  measure. 


This  experiment  was  next  rerun  on  the  11-speaker  data  base  using 
thresholds  on  the  scanning  error  of  500  and  on  the  OPTSEQ  error  of  50. 
The  prior  experiment  had  no  upper  limits  on  either  of  these  parameters. 
The  exciting  result  was  the  drastic  increase  in  the  number  of  phrases 
not  registered  for  impostors,  with  very  little  sacrifice  in  the  number 
of  phrases  not  registered  for  true  speakers,  as  shown  in  Table  36.  Also 
given  in  Table  1  are  the  equal  error  rates  for  each  case,  with  unre¬ 
gistered  phrases  for  true  speakers  not  counted. 
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Figure  26  Speaker  Verification  Performance  for  Males  Using  Normalized  Errors 


TABLE  36.  UNREGISTERED  PHRASE  RATES  FOR  IMPOSTORS /TRUE  SPEAKERS 
AND  EQUAL  ERROR  RATES  IN  PERCENT 


FEATURE  TYPE:  Filter  bank  Residual  Error 

THRESHOLDS?  :  yes  no 


Residual  Error 
yes 


%  UNREGISTERED 
PHRASES  ( IM/TS ) : 


Males  : 

80 

Females  : 

83 

Total  : 

81 

8/1.0  4. 4/0.1 

2/2.7  1. 7/0.0 

8/1.8  3. 3/0.0 


93.8/0.3 

92.6/0.3 

93.3/0.3 


EERS  IN  «  USING 

UNNORMALIZED 

ERRORS: 

Males  :  1.74 

Females  :  3.00 

Total  :  2.20 


0.73 

1.35 

1.08 


0.60 

1.27 

0.83 


EERS  IN  %  USING 
NORMALIZED 
ERRORS : 

Males  :  2.01 

Females  :  1.63 

Total  :  1.79 


1.12 

1.59 

1.55 


0.93 

1.35 

1.33 


Next,  two  additional  experiments  with  the  el even- speaker  data  set  were 
done.  One  experiment  reversed  the  order  of  regressing  the  input  and  ap¬ 
plying  the  noise  floor.  The  second  reversed  the  roles  of  the  input  and 
the  reference,  resulting  in  a  decision  function  that  was  the  ratio  of 
the  residual  error  from  the  reference  passed  through  the  optimal  inverse 
filter  for  the  input,  to  the  residual  error  from  the  reference  passed 
through  the  optimal  inverse  filter  for  itself. 


Table  37  compares  the  results  of  these  two  residual  error  experi¬ 
ments  with  the  results  of  the  prior  experiment  (middle  column  of  Table 
36) .  Note  that  both  of  these  experiments  yielded  poorer  results.  Both 
experiments  were  done  without  thresholds  applied.  Figure  28  shows  the 
more  descriptive  Type  I/Type  II  curves  for  these  experiments. 
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TABLE  37.  UNREGISTERED  PHRASE  RATES  FOR  IMPOSTORS /TRUE  SPEAKERS 
AND  EQUAL  ERROR  RATES  IN  PERCENT 

X  FILTERED  BY  Y:  Input  by  ref  Ref  by  input  Input  by  ref 

REGRESS  BEFORE 

/AFTER  NOISE  FL:  before  before  after 

THRESHOLDS?  :  no  no  no 

%  UNREGISTERED 
PHRASES  ( IM/TS ) : 

Mai es  : 

Females  : 

Total  : 

EER'S  IN  %  USING 
UNNORMALIZED 
ERRORS: 

Males  : 

Females  : 

Total  : 

EER's  IN  %  USING 
NORMALIZED 
ERRORS: 

Males  :  1.12 

Females  :  1.59 

Total  :  1.55 


F.  NONLINEAR  TIME  WARPING  EXPERIMENTS 

Of  even  greater  interest  are  the  results  of  another  set  of  experi¬ 
ments.  Instead  of  using  reference  patterns  formatted  at  fixed  intervals 
around  the  four  energy  peaks  in  an  input,  a  nonlinear  time  warping  with 
dynamic  programming  was  used  for  time  alignment  of  the  input  to  the 
reference.  As  expected,  because  of  better  time  alignment  and  larger 
reference  patterns,  there  was  indeed  another  dramatic  improvement  in 
performance.  Figures  29  and  30  compare  the  performance  on  the 
11-speaker  data  set  using  (1)  the  DSG  filter  bank  and  6-column  fixed 
format  reference  patterns,  (2)  the  residual  error  based,  6-column  fixed 
format  reference  patterns,  and  {3)  the  residual  error  based  reference 
patterns  with  nonlinear  time  warping  of  the  input.  Although  not  shown, 
applying  a  threshold  to  the  nonlinear  time  warped  case  shifts  the  bottom 
of  the  true  speaker  curve  to  the  left  and  flattens  the  top  of  the  impos¬ 
tor  curve,  since  only  1.8%  of  the  impostor  phrases  even  "register,"  i.e. 
for  only  1.8%  of  the  phrases,  the  match  between  the  input  and  the  refer¬ 
ence  is  less  than  a  given  threshold,  it  should  be  noted  that  the  non¬ 
linear  time  warping  method  does  require  about  four  times  as  much  memory 
for  storing  reference  patterns  as  the  other  two  methods. 

All  three  of  these  curves  are  for  decisions  based  only  on  one 
phrase.  Two  experiments  were  run  on  the  11-speaker  data  set  with  a 
two-phrase  strategy  using  all  pairs  of  the  twelve  phrases  from  each  im- 


2.36 

3.18 

3.08 


1.81 

1.35 

1.91 


4. 4/0.1 
1 . 7/0 . 0 
3. 3/0.0 


0.73 

1.35 

1.08 


2. 9/0.1 
0. 7/0.0 
2. 1/0.1 


2.71 

2.62 

2.64 


3. 1/0.1 
0. 6/0.0 
2. 1/0.0 


1.11 

2.50 

2.09 
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postor  and  true  speaker  session.  The  equal  error  rate  was  less  than 
0.1%;  however,  since  this  represents  only  one  or  two  errors,  it  is  not 
possible  to  estimate  the  actual  equal  error  rate  with  a  high  degree  of 
confidence.  This  emphasized  the  need  to  expand  the  test  data  set  from 
eleven  speakers.  The  preliminary  results,  however,  suggested  that  a 
nonlinearly  time  warped  residual  error  technique  can  achieve  the  goals 
of  <1.0%  true  speaker  error  and  <0.1%  impostor  error  called  for  in  the 
statement  of  work. 

In  the  method  for  performing  the  nonlinear  time  warping  in  this  ex¬ 
periment,  boundary  points  were  manually  chosen  between  each  of  the  four 
words  in  the  first  four  enrollment  phrases  (one  sample  of  each  of  the  16 
words),  based  primarily  on  energy  contours.  A  reference  pattern  for 
each  word  was  then  formatted  by  excising  the  LPC  parameters  every  other 
time  sample. 

The  procedure  used  on  the  remaining  sixteen  phrases  during  enroll¬ 
ment  was  the  same  as  used  during  verification.  One  "super"  reference 
pattern  was  formatted  by  concatenating  the  reference  patterns  for  the 
four  words.  This  super  reference  pattern  was  then  used  in  a  dynamic 
time-warping  routine  to  find  the  best  match  between  the  input  and  the 
reference.  In  this  procedure,  the  Itakura  type  distance  described  ear¬ 
lier  was  calculated  between  each  input  frame  and  each  reference  frame. 
Running  cumulative  distances  were  calculated  for  ^very  input  and  refer¬ 
ence  frame,  and  the  minimum  cumulative  error  acioss  all  input  frames  for 
the  last  reference  frame  was  selected  and  the  minimum  error  path  was 
backtracked.  One  constraint  on  the  calculation  of  the  cumulative  dis¬ 
tances  was  that  no  two  input  frames  were  allowed  in  any  path  for  the 
same  reference  frame,  nor  could  the  input  frames  corresponding  to  adja¬ 
cent  reference  frames  be  farther  apart  than  four  10  ms  frames.  Also, 
the  increment  to  the  cumulative  error  between  reference  frames  was 
weighted  by  the  distance  between  input  frames.  'T’his  weighting  was  de¬ 
termined  from  0.2  times  the  square  of  the  natural  log  of  half  the 
difference  in  the  input  frame  numbers.  This  results  in  the  following 
weights: 

input  frame  distance:  1234 

weight:  0.096  0.0  0.033  0.096 


The  warped  input  patterns  for  each  of  the  four  words  were  then 
averaged  in  with  the  reference  pattern  in  a  1/2,  1/3,  1/4,  and  1/5  pro¬ 
portion,  depending  on  whether  the  input  was  from  the  2nd,  3rd,  4th,  or 
5th  repetition  of  the  word.  The  number  of  time  samples  in  the  reference 
pattern  did  not  change  after  the  initial  patterns  had  been  established. 
Since  no  thresholds  were  used  during  enrollment,  every  phrase  was  reg¬ 
istered  . 

During  verification,  an  extra  calculation  was  added,  since  the  cu¬ 
mulative  error  in  the  dynamically  time-warped  path  was  not  the  score 
used  in  the  decision.  Instead,  the  score  from  each  selected  frame  of 
the  input  was  weighted  by  the  clipped  energy  of  the  frame.  This  gave 
less  weight  to  the  lower  energy  frames,  while  the  clipping  prevented  the 
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high  energy  vowels  from  totally  dominating  the  decision  function. 
Hence,  the  actual  decision  function  was  given  by 

^  ^  d(k)  [min  (e(k)  ,  2  <e>)  ] 

k 

D  =  -  , 


[e  (k) 


2  <e>] 


where  d(k)  was  the  difference  between  reference  frame  k  and  the  selected 
frame  of  the  input  data;  e(k)  was  the  energy  in  the  preemphasized,  win¬ 
dowed,  input  frame  (the  square  root  of  p(0)  divided  by  the  number  of 
samples  in  the  input  frame)  corresponding  to  frame  k;  and  <e>  was  the 
average  of  all  of  the  e (k)  over  the  input  utterance. 


G.  AUTOMATIC  ENROLLMENT 

Programs  were  next  developed  to  do  automatic  (no  manual  interven¬ 
tion)  enrollment  with  residual  error  data,  and  an  experiment  using  the 
11-speaker  data  set  was  run  to  determine  type  I/type  II  performance. 
The  pleasing  result  of  this  experiment  was  that  there  was  no  degradation 
in  performance  using  automatic  enrollment. 

The  change  made  in  this  experiment  was  to  automate  the  selection  of 
boundary  points  between  words,  a  task  that  previously  had  been  done  man¬ 
ually.  The  rest  of  the  algorithm  remained  unchanged.  In  this  automatic 
enrollment,  the  four  monosyllabic  words  were  first  located  by  finding 
the  four  largest  peaks  in  a  "smoothed"  energy  function,  defined  as  the 
sum  of  a  function  of  the  RMS  energies  from  six  (10  ms)  frames  before  the 
current  frame  through  six  frames  after  the  current  frame.  In  the  filter 
bank  approach,  the  RMS  energy  used  was  a  function  only  of  filters  3 
through  12,  which  eliminated  those  portions  of  the  spectrum  having  high 
energies  during  either  nasals  (the  low  frequencies)  or  sibilants  (the 
high  frequencies) .  Since  in  the  LPC  analysis  being  used  in  these  exper¬ 
iments,  we  have  only  autocorrelations  of  the  inputs,  another  method  must 
be  found  to  eliminate  the  contributions  to  the  energy  from  both  the  very 
low  and  the  very  high  frequencies.  This  could  have  been  accomplished  by 
passing  the  original  signal  through  a  bandpass  filter  before  calculating 
the  autocorrelations.  Alternatively,  the  autocorrelation  (r(i))  of  such 
a  bandpassed  signal  can  be  found  from  the  autocorrelation  of  the  origi¬ 
nal  signal  by  the  same  technique  used  to  calculate  the  residual  energy 
out  of  the  inverse  filter  as  explained  earlier.  This  can  be  done  since 

r' (i)  =  r (i)  *  h(i)  *  h(-i) , 

where  h(i)  is  the  impulse  response  of  the  bandpass  filter,  r(i)  is  the 
autocorrelation  of  the  input  [in  our  case,  the  input  has  been  preempha¬ 
sized  [x (i) -x(i-l) ]  and  windowed  with  a  Hamming  window],  and  indi¬ 
cates  convolution.  Remember,  however,  that  since  the  convolution,  h(i) 
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*  h{-i),  is  just  the  autocorrelation  [P  (0)]  of  the  impulse  response, 
r'(i)  is  just  the  convolution  of  the  autocorrelation  of  the  input  with 
the  autocorrelation  of  the  impulse  response.  A  simple  filter  to  elimi¬ 
nate  the  energies  at  the  low  and  high  ends  would  be  one  with  a  zero  at  0 
Hz  and  two  zeros  at  5000  Hz.  The  transfer  function  for  such  a  filter  is 

-1  -2  -1 

H (z)  =  (1  -  2cos(a)z  +  z  ]  (1  -  z  ), 

where  "a"  is  two  pi  times  5000  Hz  divided  by  the  sample  frequency  (12500 
Hz),  yielding  2cos(a)  =  -1.618034.  This  corresponds  to  an  impulse  res¬ 
ponse,  h(t),  of 

h(t)  =  delft)  -  [l+2cos(a)]  del(t-l)  +  [l+2cos(a)]  del(t-2) 

-  del  (t— 3 ) 

=  del (t)  -  0.618034  del  (t-1)  +  0.618034  del(t-2)  -  del (t-3) , 
where  "del"  is  the  delta  function.  h(t)  *  h(-t)  then  becomes 
h(t)*h(-t)  =  -  del (t-3)  -  1.236068  del(t-2)  +  0.854102  del (t-1) 

+  2.763932  del (t)  +  0.854102  del (t+1) 

-  1.236068  del (t+2)  -  del (t+3) . 

Substituting  this  into  the  equation  for  r(i)',  with  i  =  0,  gives 

r(0)'  =  2.763932  r (0)  +  2  [0.854120  r (1)  -  1.236068  r (2)  -  r (3) ] 

=  2.763932  r  (0)  +  1.7082  r  (1)  -  2.4721  r  (2)  -  2  r  (3) . 

As  a  final  step,  r(0)'  is  weighted  by  (1  +  r(l)/r(0)),  a  function  which 
varies  linearly  with  the  mean  frequency  of  the  input  signal  from  two  at 
0  Hz  to  zero  at  the  half  sample  frequency  (6250  Hz) .  The  "smoothed"  en¬ 
ergy  actually  used  is  then  the  square  root  of  0.01  times  this  value. 

This  "smoothed"  energy  is  input  to  a  peak  and  valley  finding  pro¬ 
gram  that  finds  all  peaks  in  the  energy  that  have  a  value  that  is  at 
least  500  and  that  is  greater  than  1.3  times  the  value  of  the  preceding 
valley.  The  four  largest  peaks  (independent  of  time  differences  between 
peaks)  are  retained  as  locations  of  the  vowels  in  each  of  the  four 
words.  The  beginning  and  end  of  the  utterance  for  purposes  of  subse¬ 
quent  processing  are  determined  as  the  times  before  the  first  energy 
peak  and  after  the  last  energy  peak,  where  the  energy  first  drops  below 
25%  of  the  peak  value  across  the  entire  utterance.  The  boundaries 
between  words  are  chosen  as  the  valleys  between  the  four  largest  energy 
peaks.  Starting  at  the  first  10  ms  frame  after  the  beginning  of  the 
utterance,  reference  patterns  for  each  of  the  four  words  are  built  by 
extracting  every  other  frame  from  the  input  and  assigning  it  to  the  word 
defined  by  the  valley  point  times.  This  is  done  for  each  of  the  first 
four  phrases,  which  defines  a  reference  pattern  for  each  of  the  sixteen 


words 


One  step  has  been  added  to  this  procedure  to  refine  the  word  boun¬ 
dary  locations  in  the  third  and  fourth  phrases.  The  word  order  for  the 
first  four  phrases  insures  that  all  possible  phoneme  transitions  between 
words  will  have  occurred  in  the  first  two  phrases.  Hence,  a 
"mini-pattern"  is  formatted  from  the  last  three  samples  of  the  reference 
pattern  for  one  word  and  the  first  three  samples  of  the  reference  pat¬ 
tern  for  the  following  word,  and  is  used  to  scan  plus  and  minus  twenty 
10  ms  frames  around  the  valleys  found  in  the  input  for  the  third  and 
fourth  phrases  to  refine  the  selected  boundary  locations  between  words 
and  make  them  more  consistent  with  those  selected  from  the  first  two 
phrases.  The  selected  boundaries  between  words  in  the  third  and  fourth 
phrases  are  then  moved  from  the  valley  points  in  the  energy  to  the  times 
of  minimum  scanning  error  found  using  the  mini-patterns.  Using  these 
newly  defined  boundaries,  reference  patterns  for  these  four  new  words 
are  then  selected  as  in  the  first  two  phrases. 

The  processing  for  all  enrollment  phrases  after  the  fourth  phrase 
and  for  all  verification  trials  remained  the  same  as  described  above  for 
the  case  of  manual  enrollment. 


H.  EXPANDED  DATA  SET  EXPERIMENTS 

Next,  to  increase  the  confidence  level  of  the  results  obtained  in 
the  prior  experiments,  the  expanded,  34-speaker  (17  males,  17  females, 
with  only  one  twin  used)  data  set  was  run  using  the  automatic  enroll¬ 
ment,  nonlinearly  time  warped  method.  As  expected,  the  performance  on 
the  larger  data  set  was  not  as  good  as  for  the  11-speaker  set,  as  can  be 
seen  from  Table  38.  The  last  two  columns  of  Table  38  are  for  the  ex¬ 
panded  data  set,  with  the  last  column  showing  the  performance  using  a 
threshold  large  enough  to  allow  almost  all  phrases  to  register.  Note 
that  since  neither  twin's  data  were  consistent,  the  true  speaker  rejec¬ 
tion  rate  was  very  high  for  both  of  the  twins.  The  last  column  of 
Table  38  shows  the  equal  error  rates  when  both  twins  were  excluded,  as 
well  as  when  only  one  twin  was  excluded. 
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TABLE  38.  EQUAL  ERROR  RATES  (EERs)  IN  PERCENT  AND  UNREGISTERED  PHRASE 
RATES  FOR  IMPOSTORS /TRUE  SPEAKERS  FOR  A  ONE-PHRASE  DECISION  USING 
A  NONLINEARLY  TIME  WARPED,  RESIDUAL  ENERGY  FEATURE  VECTOR 


ENROLLMENT  : 

MANUAL 

MANUAL 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

NO.  OF  SPEAKERS: 

11 

11 

'll 

34 

34 

THRESHOLD  : 

9E9 

1.25 

1.25 

1.25 

2.00 

EERS  IN  «: 

Males  : 

0.595 

0.556 

0.486 

0.804 

1.517 
(1.090) * 

Females  : 

0.417 

0.318 

0.318 

0.422 

0.628 

Total  : 

0.541 

0.470 

0.375 

0.628 

1.078 
(0.836) * 

EER  THRESHOLDS: 

Males  : 

209 

212 

210 

229 

245 

(233)  * 

Females  : 

240 

237 

233 

226 

230 

Total  : 

221 

221 

216 

228 

240 
(232) * 

%  UNREGISTERED 

PHRASES  ( IM/TS) : 

Males  : 

0. 0/0.0 

97.9/0.1 

98.1/0.1 

98.3/1.3 

3. 2/0.0 

Females  : 

0. 0/0.0 

99.4/0.3 

99.4/0.4 

98.7/0.4 

1. 2/0.0 

Total  : 

0. 0/0.0 

98.5/0.2 

98.6/0.3 

98.5/0.8 

2. 2/0.0 

*  Values  when  both  twins  excluded.  Unstarred  values  include  only  one 
of  the  two  twins  in  the  35-speaker  data  set. 


Additional  experiments  were  run  on  this  expanded  data  set  to  deter¬ 
mine  the  effects  of  regression  and  model  order.  These  results  are  shown 
in  Table  39.  Note  that  females  had  roughly  three  times  the  error  rate 
when  the  regression  was  eliminated  with  relatively  minor  effect  for  the 
males.  In  contrast,  although  the  error  rate  did  increase  when  the  model 
order  was  reduced,  the  effect  was  not  nearly  so  great  as  when  the  re¬ 
gression  was  eliminated. 
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TABLE  39.  EQUAL  ERROR  RATES  IN  PERCENT  FOR  ONE-PHRASE  DECISION  USING 
NONLINEARLY  TIME  HARPED,  RESIDUAL  ENERGY  FEATURE  VECTOR 
FOR  VARIOUS  REGRESS  ION /LPC  ORDER  COMBINATIONS 


ENROLLMENT 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

NO.  OF  SPEAKERS 

34 

34 

34 

34 

THRESHOLD 

2.0 

2.0 

2.0 

2.0 

LPC  ORDER** 

21 

20 

15 

14 

REGRESS  ORDER 

1 

0 

1 

0 

EERs  IN  % : 

Males  : 

1.517 

1.532 

1.746 

2.148 

(1.090) * 

(1.364) * 

(1.351) 

*  (1.885) 

Females  j 

0.628 

1.564 

0.654 

1.808 

Total  : 

1.078 

2.089 

1.214 

2.358 

(0.836) * 

(1.941) * 

(1.010) 

*  (2.297) 

EER  THRESHOLDS: 

Males  : 

245 

196 

211 

170 

(233)* 

(193) * 

(204) * 

(168) * 

Females  : 

230 

160 

201 

139 

Total  : 

240 

175 

208 

153 

(232) * 

(193) * 

(203)  * 

(168) * 

*  Values  when  both  twins  excluded.  Unstarred  values  include  only 
one  of  the  two  twins  in  the  35-speaker  data  set. 

**  LPC  order  before  regression. 


The  final  experiment  performed  with  a  nonsequential  decision  stra¬ 
tegy  using  the  expanded  data  set  was  to  test  the  results  for  a  multi¬ 
phrase  (but  nonsequential)  decision.  These  results  are  shown  in 
Table  40  for  the  34-speaker  data  set  (i.e.,  one  twin  excluded)  for  both 
the  tight  (1.25)  and  the  loose  (2.0)  thresholds. 
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TABLE  40.  EQUAL  ERROR  RATES  IN  PERCENT  AS  A  FUNCTION  OF  NUMBER  OF  PHRASES 
FOR  NONLINEARLY  TIME  WARPED,  RESIDUAL  ENERGY  FEATURE  VECTORS 


ENROLLMENT  : 

AUTOMATIC 

AUTOMATIC 

NO.  OF  SPEAKERS: 

34 

34 

THRESHOLD  : 

1.25 

2.0 

LPC  ORDER*  : 

21 

21 

REGRESS  ORDER  : 

l 

1 

NO.  OF  PHRASES: 

1 

2 

3 

4 

1 

2 

3 

4 

EERS  IN  «: 
Mai  es 

0.804 

0.169 

0.077 

0.031 

1.517 

0.79  7 

0.613 

0.475 

Females 

0.422 

0.141 

0.094 

0.031 

0.628 

0.322 

0.257 

0.153 

Total 

0.628 

0.191 

0.095 

0.024 

1.078 

0.572 

0.421 

0.397 

EER  THRESHOLI 
Males 

)S  : 

229 

220 

216 

212 

245 

250 

253 

260 

Femal  es 

226 

213 

211 

210 

230 

234 

233 

225 

Total 

228 

218 

213 

211 

240 

24  2 

243 

241 

*  LPC  order  before  regression. 


I.  SEQUENTIAL  DECISION  STRATEGY  EXPERIMENTS 

Finally,  the  34-speaker  data  set  was  run  using  the  sequential  deci¬ 
sion  strategy  employed  in  the  current  CIC  system  and  in  the  WU  system. 
Of  course,  since  these  experiments  used  different  data  representations 
and  preprocessings,  the  actual  parameters  used  in  the  decision  were  dif¬ 
ferent;  however,  the  global  strategy  was  identical.  The  phrase  accep¬ 
tance  threshold  for  "registering"  a  phrase  was  2.0  (the  same  as  that 
used  in  the  prior  section),  which  resulted  in  all  phrases  being  regis¬ 
tered  for  both  true  speakers  and  impostors. 

The  results  of  these  full  decision  strategy  experiments  are  shown 
in  Table  41  using  three  different  sets  of  decision  thresholds  and  allow¬ 
ing  a  maximum  of  either  one  or  two  repeats  of  any  particular  phrase. 
The  number  of  true  speaker  trials  was  714  (34  speakers  *  21  sessions). 
However,  in  order  to  average  out  some  of  the  phrase  order  dependence, 
the  choice  of  the  initial  phrase  was  rotated  around  the  12  phrases,  with 
the  next  eleven  phrases  following  circularly  after  the  initial  phrase. 
Hence,  there  were  actually  8,568  true  speaker  trials  (714  *  12).  The 
number  of  impostor  trials  was  1088  (17  impostors  *  16  references  *  2 
sessions  *  2  sexes).  The  same  rotating  of  the  choice  for  the  initial 
phrase  was  done  for  the  impostor  data,  resulting  in  13,056  total  impos¬ 
tor  trials  (1088  *  12) .  Note  that  all  of  these  results  are  much  better 
than  even  the  desired  goals  gT yen  In  the  contract  statement  of  worlc  (II 
true  speaker  rejection  and  0^1%  impostor  acceptance) . 
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TABLE  41.  TRUE  SPEAKER  (TS)  REJECTS  AND  IMPOSTOR  (IM)  ACCEPTANCES 
FOR  A  FULL  "CIC,"  MULTIPHRASE  DECISION  STRATEGY  USING 
A  NONLIN EARLY  TIME  HARPED,  RESIDUAL  ENERGY  DECISION  FUNCTION 


TEST  CONDITIONS: 


ENROLLMENT 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

AUTOMATIC 

NO.  OF  SPEAKERS 

34 

34 

34 

34 

34 

NO.  TS  TRIALS 

8  568 

8568 

8568 

8568 

8568 

NO.  IM  TRIALS 
PHR.  ACCEPT. 

13056 

13056 

13056 

13056 

13056 

THRESHOLD 

2.00 

2.00 

2.00 

2.00 

2.00 

MAX  #  PHRASES 

7 

7 

7 

7 

7 

MAX  #  REPEATS 

1 

2 

1 

2 

1 

VERIFICATION  DECISION  THRESHOLDS: 


SET  1 

SET  1 

SET  2 

SET  2 

SET  3 

NORMAL  MODE: 

1  PHRASE  : 

1.15 

1.15 

1.15 

1.15 

1.15 

2  PHRASES: 

1.165 

1.165 

1.17 

1.17 

1.17 

3  PHRASES: 

1.18 

1.18 

1.19 

1.19 

1.19 

4  PHRASES: 

1.195 

1.195 

1.21 

1.21 

1.20 

PREJUDICED  MODE: 

1  PHRASE  : 

1.135 

1.135 

1.12 

1.12 

1.14 

2  PHRASES: 

1.155 

1.155 

1.15 

1.15 

1.16 

3  PHRASES: 

1.175 

1.175 

1.18 

1.18 

1.18 

4  PHRASES: 

1.195 

1.195 

1.21 

1.21 

1.20 

TEST  RESULTS: 

NO.  OF  TS  REJECTS 

• 

• 

Males  : 

47 

37 

30 

28 

38 

Females  : 

12 

15 

3 

6 

11 

Total  : 

59 

52 

33 

34 

49 

NO.  OF  IM  ACCEPTANCES: 

Males  : 

3 

8 

9 

10 

5 

Females  : 

0 

0 

3 

2 

0 

Total  : 

3 

8 

12 

12 

5 

TOTAL  (MALES  &  FEMALES) 

ERROR  RATES  IN 

%  : 

TS  Rejects: 

0.689 

0.607 

0.385 

0.397 

0.572 

IM  Accepts: 

0.023 

0.061 

0.092 

0.092 

0.038 

*  Only  one  of  the  twins  in  the  35-speaker  data  set  was  included; 
however,  when  impostor  trials  were  run  between  twins,  there  were 
no  successful  impersonations. 
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Appendix  I 

ENTRY  CONTROL  POINT  SIMULATION  MODEL 


There  are  two  primary  parts  to  the  simulation  model  for  an  entry 
control  point: 

o  a  user  arrival  rate  model  and 
o  a  booth  service  time  model. 

The  simulation  model  developed  in  this  study  accounts  for  a  variable  number  of 
booths  at  a  single  entry  control  point,  arrivals  of  groups  of  users,  and 
arrival  rates  that  vary  according  to  the  time  of  day.  Although  allowing 
multiple  people  in  the  booth  at  one  time  is  a  desirable  feature  for  improving 
throughput,  only  one  occupant  at  a  time  was  allowed  as  required  in  the  BISS 
spec  if i cat  ion. 1 2 

1 .  User  Arrival  Rate  Model 

The  queuing  model  used  for  user  arrival  rates  is  that  groups  of  people 

arrive  at  times  that  are  exponentially  distributed.  The  exponential 

2 

distribution  of  arrival  times  requires  that  the  probability  that  an  arrival 
will  occur  in  a  small  time  interval  is  very  small  and  that  the  occurrence  of 
an  arrival  is  statistically  independent  of  the  occurrence  of  other  arrivals. 
This  distribution  is  given  by 

f(t)  *  (1/b)  exp{-t/b)  , 


1.  Segment  Specification  for  Entry  Control  Subsystem,  AFSC-ESD,  Base  and 
Installation  Security  System  Program  Office,  Spec.  No.  B I SS-ENC- 1 4000 ,  15 
August  1977. 

2.  T.  H.  Naylor,  et  a  1 . ,  Computer  Simulation  Techniques  (John  Wiley  and 
Sons,  New  York,  1966). 
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where  "b"  is  the  mean  arrival  rate  for  groups  of  people.  The  need  to  model 
groups  as  exponentially  distributed  rather  than  individuals  is  that  the 
arrival  of  an  individual  Is  not  necessarily  Independent  of  other  arrivals. 
This  is  obvious  from  observations  of  people  leaving  a  common  work  area 
together  or  delaying  their  arrival  by  a  few  seconds  In  order  to  walk  to  the 
entry  control  point  with  other  individuals.  Since  mean  arrival  rates  in  the 
BISS  specification  are  in  terms  of  people  per  minute,  rather  than  groups  per 
minute,  "b"  must  be  defined  in  terms  of  the  probabilities  of  occurrence  of 
various  sized  groups.  Estimates  of  the  probabilities  of  occurrence  were 
obtained  by  measuring  entries  and  exits  of  employees  at  the  Semiconductor 
Building  of  Texas  Instruments  in  Dallas.  These  probabilities  are  as  follows: 

n  *  number  of  people  in  group:  1  2  3  h  5  >  5 
p(n)  *  p  (number  of  people  in  group  ■  n):  0.90  0.05  0.03  0.01  0.01  0 


If  "a"  is  the  mean  arrival  rate  for  people  per  minute,  then  "b"  is  given  by 


b  = 


a/1.18 


The  mean  arrival  rates  used  in  this  model  are  those  given  in  the  BISS 
specification.  These  arrival  rates  (ARs)  for  entrants  and  extrants  are 
reproduced  in  Figures  1  and  2,  respectively,  and  are  shown  in  tabular  form  in 
Table  1,  where  rates  between  table  entries  are  found  by  linear  interpolation. 
The  BISS  specification  states,  however,  that  only  the  activity  over  the  period 
0600  to  1 800  hours  should  be  used  to  demonstrate  that  the  average  throughput 
specification  is  met  (thiee  people  per  minute  for  voice  authentication  on  both 
entrance  and  exit  and  four  people  per  minute  for  voice  authentication  only  on 
entrance)  and  that  the  maximum  waiting  time  is  not  exceeded  (three  minutes  at 
the  95th  percentile).  It  should  be  noted,  however,  that  these  authors 
consider  a  maximum  waiting  time  of  three  minutes  at  the  95th  percentile  to  be 
sufficiently  long  to  doom  such  an  automated  entry  control  system  to  failure 
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Figure  1  Arrival  Rates  for  Entry 


Table  1 

Arrival  Rates  for  Entrants  and  Extrants 


Arrival  Rates  (People/Min) 


Time 

Index 

T  ime 
(Hours} 

Entrants 

Extrants 

0 

0030  -  0530 

0.3 

0.3 

1 

0600 

0.3 

0.3 

2 

0630 

2.15 

0.3 

3 

0700 

4.0 

0.3 

4 

0730 

4.0 

0.5 

5 

0800 

2.0 

1.0 

6 

0830 

1.5 

i  .0 

7 

0900 

1.0 

1.5 

8 

0930 

1.0 

1.0 

9 

1000 

1  .0 

1.0 

10 

1030 

1.0 

1.5 

11 

1100 

1.0 

1.5 

12 

1130 

1.0 

3-0 

13 

1200 

2.0 

2.0 

14 

1230 

1.0 

2.0 

15 

1300 

3.0 

2.0 

16 

1330 

2.0 

1.5 

17 

1400 

1.0 

1.0 

18 

1430 

1.0 

1.0 

19 

1500 

1  .0 

1  .0 

20 

1530 

1.0 

1.0 

21 

1600 

2.0 

3.0 

22 

1630 

1  .0 

5.0 

23 

1700 

0.3 

2.0 

24 

1730 

0.3 

1.15 

25 


1800  -  2400 
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0.3 


0.3 


due  to  lack  of  user  acceptance.  A  maximum  waiting  time  of  30  seconds  at  the 
95th  percentile  would  be  more  reasonable.  Another  important  point  is  that  the 
entry  and  exit  rate  profiles  are  themselves  averages  that  vary  according  to 
day  of  the  week,  time  of  year,  etc.  Since  using  such  averages  in  finding  95th 
or  98th  percentile  is  imprudent  at  best,  allowing  such  lax  requirements  as 
three  minutes  at  the  95th  percentile  is  certainly  not  appropriate. 


The  arrival  rates  (ARs)  shown  in  Figures  1  and  2  and  in  Table  1  are  for  a 
specified  population  size.  For  populations  of  other  sizes  the  entire  entry 
and  exit  rate  profiles  must  be  multiplied  by  an  amplitude  factor  (AF)  as 
specified  in  Table  2. 


Table  2 

Entrant/Extrant  Amplitude  Factors  for  Various  User  Populations 

User  Amplitude* 

Population  Factor  (AF) 


500 

0.37 

1000 

0.52 

2000 

0.80 

2500 

0.94 

3000 

1.09 

4000 

1.37 

5000 

1.66 

*  Derived  from  straight  line  fit  for  site  1  of  bases  A,  B,  and  C;  p.  1 72 d  of 
BISS  specification.  Equation  for  line  is  approximately  AF  »  (number 
users/3500)  +  0.23. 
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Hence,  "b,"  the  mean  average  arrival  rate  at  a  specified  time  "t"  (in 
minutes)  into  a  twelve  hour  day  with  0600  hours  cor  respond  i  ng  to  t  *  0,  is 
found  by 


b(t)  -  |aR(T)  +  [AR(T  +  1)  -  AR ( T ) ]  *  RATIO}  *  AF/1.18 


where  RATIO  *  fractional  part  of  t/30, 

T  ■  (integer  part  of  t/30)  +  1, 

T  +  1  *  (T  modulo  2k)  +  1, 

and  AR(T)  is  specified  between  the  dashed  lines  in  Table  1. 

2.  Booth  Service  Time  Model 

The  booth  service  (or  occupancy)  time  model  is  the  second  part  to  the 
model  for  an  entry  control  point.  This  occupancy  time  is  determined  as  the 
time  required  to  open  the  module  door,  enter  the  module,  allow  the  door  to 
close,  enter  an  ID  number,  verify  your  identity,  and  exit  the  booth  allowing 
the  door  to  close  behind  you.  The  model  for  this  operation  is  given  in  the 
BISS  specification,  which  describes  the  service  time  by  a  probability  density 
function  of  the  form: 


P(t)  *  [(t  -  b)/a2]  exp[-(t  -  b)/a]  ,  (1) 

where  the  service  time  parameters  (a  and  b)  are  given  in  Table  3-  These 
parameters  were  derived  assuming  ( 1 )  no  verification  on  exit;  and  (2)  that 
the  ID  entry  device  was  located  outside  the  booth,  allowing  the  time  for  ID 
entry  to  be  masked  if  the  booth  were  occupied.  Hence,  the  entries  in  Table  3 
modeled  the  convolution  of  times  for  the  events  specified  in  Table  k,  each  of 
which  was  assumed  to  be  exponentially  distributed,  except  for  the  dead  time. 
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Table  3 

Booth  Service  Time  Parameters 


Type  of 
Traffic 

State  of 
Pedestrian 
Module 

Parameter  Values 
(Seconds) 
a  b 

Entry 

Unoccupied 

4.0 

11.0 

Entry 

Occupied 

3.5 

9.5 

Exit 

Unoccupied 

2.5 

7.0 

Exit 

Occupied 

2.0 

5.5 

Table  4 

Components  of  Booth  Service  Time  Model 


Type  of 

T  raff  ic 

State  of 
Pedestrian 
Module 

Components 

Entry 

Unoccupied 

Two 

Doors , 

ID  Entry,  Verification,  Dead  Time 

Entry 

Occupied 

Two 

Doors , 

Verification,  Dead  Time 

Exit 

Unoccupied 

Two 

Doors , 

ID  Entry,  Dead  Time 

Exit 

Occupied 

Two 

Doors , 

Dead  Time 
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Since  the  booth  under  consideration  has  the  ID  entry  internal  to  the 
booth,  only  the  first  and  third  entries  in  Table  4  are  of  interest.  There  are 
other  states,  however,  which  need  to  be  modeled  that  are  not  included  in 
Tables  3  and  4.  One  set  of  these  other  states  accounts  for  a  "short"  entry  or 
exit  occurring  when  a  person  or  group  is  waiting  to  enter  the  booth  by  the 
same  door  a  person  is  exiting  by  counting  the  door  time  only  once.  A  flow 
chart  showing  the  use  of  short  transactions  is  given  in  Figure  3.  (If 
multiple  occupants  were  allowed,  another  set  of  states  would  be  required  to 
account  for  overlapping  usage  of  the  doors.)  A  revised  list  of  the  components 
of  the  booth  service  time  model  and  the  associated  parameter  values  is  given 
in  Table  5*  The  new  parameter  values  were  derived  using  the  same  procedure  as 
used  for  the  original  parameters  that  fit  equation  (1)  to  a  more  complicated 
expression  derived  using  Laplace  transforms.  This  booth  service  time  model 
using  Equation  (1)  and  the  parameters  in  Table  5  then  reduce  to  generating 
random  deviates  that  fit  the  equation.  Equation  (1)  is  an  Erlang  distribution 
for  which  random  deviates  may  be  generated3  by 

2 

t  =  b  -  a  log  r.]  , 


where  rj  is  a  rectangular  variate  with  range  0,1. 

Using  Table  5  as  the  basis  for  modeling  the  booth  also  allows  the 
verification  to  be  modeled  separately  and  added  to  the  booth  time  derived 
using  the  second  and  fourth  entries  in  Table  5.  The  necessity  for  modeling 
the  booth  separately  arises  since  user  rejections  have  been  neglected  in  the 
BISS  specification  model.  To  account  for  this,  a  model  for  verification  has 
been  established  in  which  the  time  for  each  phrase  to  be  said  is  modeled  with 


3.  N.A.J.  Hastings  and  J.  B.  Peacock,  Statistical  Distributions  (Butterworth, 
London,  1974). 
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Update  All  Queue  Times 


Figure  3  Overall  Flow  Diagram  for  Entry  Control  Point  Simulation  Model 
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Table  5 

Revised  Booth  Service  Time  Parameters 


Type  of 
Transaction 


Verification 
Requ i red 


Parameter  Values 
(Seconds) 


Components 


Short 


Short 


Two  Doors,  ID  Entry,  Verification,  4.0  11.0 

Dead  Time 

Two  Doors,  ID  Entry,  Dead  Time  2.5  7.0 

One  Door,  ID  Entry,  Verification,  3.15  8.5 

Dead  T i me 

One  Door,  ID  Entry,  Dead  Time  1.35  4.5 


Table  6 

Cumu 1 ative  Verification  Probabilities 


Number  of 
Phrases:  N 


12  (N. 


P  (Verified  in  N 
Phrases  or  Less) 

0.613 

0.864 

0.946 

0.970 

0.973 

0.975 

0.979 

0.987 

0.990 

0.990 

0.990 

0.990 


a  fourth  order  \  distribution,  and  a  "not  verified"  decision  was  accompanied 

2 

by  its  own  fourth  order  x  distribution.  The  number  of  phrases,  N,  required 
for  verification  was  determined  such  that  p  (verified  in  N  -  1  phrases  or 
less)  <  r  <  p  (verified  In  N  phrases  or  less),  where  r  is  a  rectangular 
variate  with  range  0,1.  Obviously,  r  >  p  (verified  in  NMAX  phrases  or  less) 
is  the  "not  verified"  case.  The  probabilities  used  in  this  model  (shown  in 
Table  6)  were  taken  from  Table  LIV  of  Volume  2  of  the  BISS  test  results, ** 
assuming  l!  not  verified. 

2 

The  x  variable  for  a  phrase  prompt  and  response  must  be  translated  and 
scaled  such  that  it  is  of  the  form 


-t  -  t_^(ph)-. 


min 


exp 


j-  [t  ~ 


(2) 


For  x  variables,  the  mean  is  u  (»  4  in  this  case),  and  the  standard  deviation 
is  /2v  .  Using  an  average  prompting  time  of  1.9  s,  average  response  time  of 
1.93  s,  and  standard  deviation  of  response  time  of  0.4  s,  (p.  129  of  reference 
4),  we  have 


and 


**  ■  [*ave  ”  lmin(ph)3  /  a  *  [3.83  -  tmjn(ph)]/a 
/8  -  stand,  dev. /a  -  0.4(4)/[3.83  -  tmjn(ph)] 


yielding  tmjn(ph)  -  3.26 

t  (ph)  -  t  .  (ph) 

aum  'r  •  mm  'r  ' 


a  -  0.1425  - 


ave 


min 


4.  Martin  J.  Foodman,  "Test  Results  -  Advanced  Development  Models  of  BISS 
identity  Verification  Equipment;  Volume  2,  Automatic  Speaker  Verification," 
Mitre  Technical  Report  MTR-3442,  1  September  1977. 


2 

The  X  variable  for  the  "not  verified"  model  must  also  be  translated  and 
scaled  as  in  Equation  (2).  From  p.  129  of  Reference  4,  the  average 
verification  time  is  given  as  6.2  s.  The  average  time  allowed  in  the  BISS 
model  for  ver  if  icat  ion,  however,  is  just  the  difference  in  means  (b  +  2a) 
between  the  first  two  entries  in  Table  5,  which  is  7  s.  With  a  33% 
verification  rate,  the  average  for  "not  verifieds"  can  be  found  by 

0.99  (6.2)  +  0.01  £tave  (nv)^j-  7.0 

tave  (nv)  *  86-2  s. 

Letting  tmjn  for  "not  verified"  equal  twelve  times  tmjn  for  each  phrase 
yields  tmjn  (nv)  40  s  and  a  *  11.55. 

Hence,  the  models  for  verification  times  are  as  follows: 

N 

tv  =  £  (3.26  +  0.1425  x.) 
i=l 

t  =  40.0  +  11.55  X 
nv 


where  the  "x's"  are  chi -squared  random  deviates  generated  by 


r(v“1)/2  -j 

=  2  log  [  .n  r.J+  n 


where  r;  is  a  rectangular  variate  with  range  0,1  and  n  is  a  normal  variate 
with  mean  0  and  standard  deviation  1. 

In  conclusion  it  should  be  noted  that  the  "dead"  time  does  not  include 
time  lost  due  to  insufficient  processing  capability.  The  assumption  has  been 
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made  in  this  section  that  sufficient  computing  power  is  available  to  provide 
the  user  immediate  response. 

3.  Results 

Simulations  were  run  using  the  entry  and  exit  rate  profiles  shown  in 
Figures  1  and  2  for  the  seven  amplitude  factors  (AF)  given  in  Table  2  and  for 
AF  *  1.0.  Simulations  were  run  for  each  of  the  following  cases,  where  each 
run  simulated  five  hundred,  12-hour  days: 

1.  Number  of  booths  varying  from  a  minimum  number  (determined  below) 
through  six. 

2.  Verification  required  (a)  on  entry  alone  and  (b)  on  both  entry  and 
exit. 

3.  Verification  modeled  (a)  as  in  the  BISS  specification  and  (b)  using 
the  chi-squared  model  for  each  verification  phrase. 

The  minimum  number  of  booths  can  be  determined  from 

NB  >  total  processing  time/elapsed  time 

NB  >  AF  [(ave  entrant  rate)(ave  entrant  processing  time) 

+  (ave  extrant  rate)  (ave  extrant  processing  time)]. 

Assuming  all  transactions  are  "long"  ones,  the  average  (b  +  2a)  booth  times 
for  the  first  two  entries  in  Table  5  are  19  and  12  respectively. 

Figures  1  and  2  show  average  entrant  and  extrant  rate  profiles.  Although 
the  average  rates  specified  in  the  above  inequality  could  be  the  average  of 
these  profiles  over  the  0600  -  1 8 0 0  time  period  (1.48  people/min.  for  both 
entrants  and  extrants),  since  these  profiles  are  themselves  averages  it  is  more 
appropriate  to  use  the  maxima  of  the  average  profiles  (4  entrants/min  and  5 
extrants/mi n) .  Maximum  AFs  as  a  function  of  various  NB  for  both  assumptions 
are  given  in  Table  7. 
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Table 


Maximum  Amp 1 itude  Factors  (AFs)  as  a  Function 
Of  Number  of  Booths  (NB 


Maximum  AF 


Condition 


Using  Maximum  of 
Average  Profiles 


Using  Average  of 
Average  Profiles 


Selected  comparisons  of  these  simulations  are  made  in  Figures  4  -  8. 
Note  that  all  of  these  figures  except  Figure  5  are  on  log-by-probability 
scales,  where  all  plots  show  the  probability  that  a  specified  time  is  less 
than  a  certain  value.  Figure  5  is  the  same  as  Figure  4  except  the  curves  are 
plotted  on  linear-by-probability  scales  instead. 

Figures  4  and  5  show  the  waiting  time  for  an  available  booth  for  various 
amplitude  factors,  for  the  case  of  three  booths,  using  the  Tl  modeled 
verification,  with  verification  on  both  entry  and  exit. 

Figure  6  shows  not  only  the  waiting  time  for  an  available  booth,  but  the 
total  delay  through  the  booth  (waiting  and  processing)  as  well.  Both  times 
are  plotted  for  a  varying  number  of  booths,  for  an  AF  of  one,  using  the  Tl 
modeled  verification,  with  verification  on  both  entry  and  exit.  The 
inflections  in  the  total  delay  curves  around  the  99th  percentile  are  due  to 
the  1$  "not  verifieds"  which  have  much  longer  average  processing  times  than 
for  successful  verifications. 

The  fact  that  "not  verifieds"  are  not  explicitly  accounted  for  in  the 
BISS  specification  (the  MITRE  model)  becomes  apparent  in  Figure  7.  Figure  7 
compares  the  MITRE  model  to  the  Tl  model  for  the  case  of  five  booths,  an  AF  of 
one,  with  verification  on  both  entry  and  exit. 

Figure  8  shows  the  advantage  of  verification  only  on  entry  if  the 
scenario  permits.  This  figure  shows  both  the  total  delay  and  just  the  waiting 
time  for  the  case  of  five  booths,  an  AF  of  one,  and  using  the  Tl  modeled 
verification  both  for  entry  alone  and  for  both  entry  and  exit. 

More  details  of  the  simulations  are  given  in  Tables  8-11  which  give  the 
averages  in  seconds  of  both  the  waiting  time  alone  and  the  waiting  plus 
processing  time  for  both  entry  and  exits,  and  in  Tables  12  -  15  which  give  the 
95th  and  98th  percentiles  in  seconds  for  both  the  waiting  time  and  the  total 
delay.  The  increased  throughput  shown  in  Tables  8-11  for  the  longer  waiting 
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Table  8 

Average  Waiting  Times  and  Total  Delays  (in  Seconds)  Usin 
T1 -Modeled  Verification  on  Both  Entry  and  Exit 
(Continued) 
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Table  12 

95th  and  98th  Percentiles  fin  Seconds)  for  Waiting  Times  and  Total  Delays  Using 
TI -Modeled  Verifications  on  Both  Entry  and  Exit 


Entrance _  _ Exit 


Wait 

Wait  + 

Proc 

Wait 

Wait  + 

Proc 

AF 

NB 

22 

22 

22 

28 

22 

2§ 

22 

2§ 

0. 

37 

1 

66 

96 

88 

119 

70 

106 

91 

129 

2 

18 

29 

42 

56 

18 

29 

42 

56 

3 

6 

15 

33 

43 

6 

15 

33 

43 

4 

0 

5 

30 

41 

0 

5 

30 

41 

5 

0 

0 

29 

41 

0 

0 

29 

41 

0. 

37 

6 

0 

0 

29 

41 

0 

0 

29 

41 

0. 

52 

1 

94 

139 

115 

159 

104 

158 

124 

178 

2 

21 

33 

44 

59 

22 

33 

44 

60 

3 

10 

16 

34 

44 

10 

16 

34 

44 

4 

0 

8 

30 

41 

0 

8 

30 

41 

1 

5 

0 

0 

29 

41 

0 

0 

29 

41 

0.52 

6 

0 

0 

29 

40 

0 

0 

29 

40 

0. 

8 

1 

384 

402 

555 

573 

539 

740 

556 

758 

2 

30 

45 

52 

71 

31 

47 

53 

74 

3 

13 

19 

36 

46 

13 

19 

36 

46 

4 

2 

12 

31 

42 

2 

12 

31 

42 

5 

0 

2 

29 

40 

0 

2 

29 

40 

0. 

8 

6 

0 

0 

29 

40 

0 

0 

29 

40 

0.94 

2 

36 

54 

58 

81 

4l> 

62 

62 

89 

3 

14 

21 

37 

48 

14 

21 

37 

48 

4 

5 

12 

31 

42 

5 

12 

32 

42 

5 

0 

4 

30 

40 

0 

5 

30 

41 

0.94 

6 

0 

0 

29 

40 

0 

0 

29 

40 

1.0 

2 

39 

59 

6l 

85 

44 

71 

67 

98 

3 

15 

22 

38 

49 

15 

23 

38 

50 

4 

6 

13 

32 

43 

6 

13 

32 

43 

5 

0 

5 

30 

41 

0 

6 

30 

41 

1. 

0 

6 

0 

0 

29 

41 

0 

0 

29 

41 

148 

L  j 


_ Entrance _  _ Exit _ 

Wait  Walt  +  Proc  Wait  Wait  ±  Proc 


AFNB2i28i^282i282i2i 

1.09  2  46  69  68  94  56  92  78  116 

3  16  24  39  50  16  25  39  51 

4  7  14  32  43  7  14  32  43 

5  0  7  30  41  0  7  30  41 

il  6  0  0  29  41  0  0  29  41 


Table  13 


95th  and  98th  Percentiles  (In  Seconds)  for  Waiting  Times  and  Total  Delays  Using 
BISS  Specification  Model  for  Verification  on  Both  Entry  and  Exit 
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95th  and  98th  Percentiles  (In  Seconds)  for  Waiting  Tiroes  and  Total  Delays  Using 

TI-Hodeled  Verifications  on  Entry  Only 
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j  Table  15 

(  95th  and  98th  Percent i lea  (In  Seconds)  for  Waiting  Timet  and  Tote!  Delays  Lit  In' 

|  BISS  Specification  Model  for  Verification  on  Entry  Only 

(Continued) ™~  " 
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1.66 
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0 
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0 
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times  is  a  direct  consequence  of  a  greater  percentage  of  "short"  transactions 
that  occur  when  someone  is  waiting  to  enter  a  booth  that  someone  Is  exiting. 

Tables  16  and  17  give  histograms  of  both  waiting  time  alone  and  the 
processing  time  alone  as  a  function  of  time.  Both  tables  are  for  two  booths, 
an  AF  of  one,  using  the  Tl  modeled  verification.  Table  16  is  for  verification 
both  on  entry  and  exit,  and  Table  17  is  for  verification  on  entry  alone. 
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Appendix  II 

PROMPTING  WORD  SELECTION  TRADE-OFF  STUDY 


The  results  presented  in  this  section  are  based  upon  experimental  data  ex¬ 
tracted  from  the  following  six  sources: 

1.  Phase  I  analysis  at  Texas  Instruments  (July  1974) 

2.  MITRE  Phase  I  evaluation  reference  files  (Oct-Nov  1975) 

3.  MITRE  Phase  II  evaluation  (Aug-Oct  1976) 

4.  21  session  data  set  from  11  speakers  (1979) 

5.  CIC  entry  control  system  references  and  casual  impostor 
sessions  from  a  subset  of  CIC  users  ( 1979 ) 

6.  CIC  entry  control  system  reference  files  (primarily  1979) 

For  conciseness,  the  sources  will  be  referred  to  by  number  throughout  this  sec¬ 
tion.  These  sources  of  data  represent  the  three  different  sets  of  prompting 
words  given  in  Table  1.  Source  1  used  the  words  in  set  (A);  sources  2-3  used 
the  words  in  set  (B);  sources  4-6  used  the  words  in  set  (C). 

TABLE  1 

SETS  OF  PROMPTING  WORDS 


SET  A. 

COOL 

BIRDS 

STOPPED 

WEST 

SMALL 

BUGS 

SING 

DOWN 

HUGE 

TWIGS 

SANG 

DEEP 

STRANGE 

TOADS 

STOOD 

WILD 

SET  B. 

NORTH 

LAWN 

GREAT 

CAMP 

SOUTH 

LIMB 

WIDE 

POINT 

EAST 

RUN 

GOOD 

PLOT 

FIRST 

ROOM 

WEST 

CUBE 

SET  C. 

GOOD 

BRUCE 

CALLED 

HARD 

PROUD 

BEN 

SWAM 

NEAR 

STRONG 

JOYCE 

CAME 

NORTH 

YOUNG 

JEAN 

SERVED 

HIGH 

The  types  of  data  readily  available  during  this  evaluation  were  not  the 
same  from  all  six  sources.  Useful  information  would  include  the  following: 

1.  Actual  reference  patterns  for  all  enrolled  speakers 

2.  Squared  distances  (scanning  errors)  for  each  word  for  each 
true  speaker  verification  session  (TYPE  I  trial) 

3.  Squared  distances  (scanning  errors)  for  each  word  for  each 
impostor  session  (TYPE  II  trial) 

4.  Number  of  verification  sessions 

5.  Average  EHAT's  (expected  scanning  errors)  for  each  word  for 
each  speaker 

6.  Equal  error  rate  (EER)  for  each  word  for  each  sex  and  type  of 
preprocessing. 

To  clarify  which  of  the  above  data  ate  available  from  each  of  the  six 


159 


sources,  a  short  description  of  each  source  is  given  below.  Note  that  all  im¬ 
postor  trials  used  only  reference  files  for  speakers  of  the  same  sex. 

I.  DATA  SOURCE  DESCRIPTIONS 


1.  The  phase  1  analysis  at  TI  was  based  on  63  male  references.  Although 
73  male  speakers  were  enrolled,  only  63  had  verification  sessions.  The  only 
available  information  from  this  data  set  that  pertained  to  this  study  were  equal 
error  rates,  average  scanning  errors,  and  number  of  sessions  for  each  speaker. 
Every  session  consisted  of  four  phrases  (each  word  used  once  for  each  session). 
The  true  speaker  scanning  errors  used  in  calculating  both  the  average  scanning 
errors  and  the  equal  error  rates  were  from  all  sessions  except  post-enrollment 
(the  first  four  sessions  after  enrollment)  for  all  of  the  63  true  speakers.  The 
impostor  scanning  errors  used  in  determining  equal  error  rates  were  found  by 
comparing  two  sessions  from  60  of  the  63  speakers  to  the  reference  files  at  the 
end  of  the  experiment  (after  all  adaptation)  for  the  other  62  speakers,  for  a 
total  of  7440  impostor  trials  for  each  word.  The  number  of  true  speaker  ses¬ 
sions  from  which  the  reference  files  were  derived  ranged  from  8  to  130,  with  an 
average  of  25.5  sessions  per  speaker. 

2.  The  data  available  from  the  phase  I  MITRE  evaluation  consisted  of  list¬ 
ings  of  the  reference  files  at  the  completion  of  the  test.  Although  the  actual 
reference  patterns  existed  in  documentation,  they  were  not  in  a  machine  readable 
form.  The  only  Information  extracted  from  these  listings  consisted  of  the 
average  EHAT's  for  each  word  for  each  speaker,  the  sex  of  each  speaker,  and  the 
number  of  sessions  for  each  speaker.  These  data  are  from  209  speakers  (120  male, 
39  female).  The  average  number  of  sessions  was  14.1  for  males  and  12.2  for  fe¬ 
males.  The  average  number  of  phrases  across  all  speakers  was  4.77  during 
post-enrollment  and  was  1.70  phrases  for  subsequent  processing. 

3.  More  extensive  data  for  the  phase  II  MITRE  evaluation  existed  than  for 
phase  I .  Subsets  of  both  the  impostor  and  true  speaker  trials  were  punched  on 
cards  from  listings,  which  included  other  information  such  as  EHAT's,  phrase 
number  and  sesison  number.  The  number  of  enrolled  speakers  was  199  (164  males, 
35  females).  The  number  of  impostors  was  78  (71  males,  7  females)  from  the  set 
of  enrolled  speakers,  which  were  compared  to  112  (106  males,  6  females)  refer¬ 
ence  patterns.  The  number  of  impostor  trails  available,  however,  was  not  the 
N*(N-1)  possible,  but  was  a  subset  as  shown  in  Table  2.  The  actual  numbers  of 
impostor  and  true  speaker  trials  is  shown  in  Table  3. 

TABLE  3 

NUMBER  OF  IMPOSTOR  AND  TRUE  SPEAKER  TRIALS  USED 
FROM  MITRE  PHASE  II  EXPERIMENTS 


SESSIONS 

PHRASES 

AVE  NO. 

PER  WORD 

TOTAL 

NON  PE 

TOTAL 

NON  PE 

TOTAL 

NON  PE 

MALE  IMPOSTORS 

713 

— 

3366 

— 

842 

— 

MALE  TRUE  SPEAKERS 

2373 

1781 

8262 

5962 

1466 

FEMALE  IMPOSTORS 

13 

— 

69 

— 

xl 

— 

FEMALE  TRUE  SPEAKERS 

498 

372 

1954 

1408 

352 

4.  The  fourth  data  set  was  collected  from  11  speakers  (6  males,  5  females) 
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in  the  sound  booth  in  the  TI  Speech  Research  Lab.  Each  speaker  contributed  21 
sessions  after  enrollaent.  The  primary  purpose  of  these  data  was  to  provide  a 
set  of  actual  speech  data  that  could  be  massaged  through  different  filters,  pre¬ 
processings,  etc.  However  the  limited  number  of  speakers  made  the  general  ap¬ 
plicability  of  the  results  somewhat  questionable. 

5.  The  fifth  data  set  was  collected  using  the  TI  entry  control  booth. 
This  data  set  included  two  sets  of  specially  collected  impostor  data:  one  set 
of  40  males,  using  BISS-type  preprocessing,  and  one  set  of  23  males  and  9  fe¬ 
males  using  IPM0D2-type  preprocessing.  Each  impostor  was  enrolled  on  the  system 
when  the  Impostor  data  were  collected,  but  were  not  necessarily  in  the  reference 
data  set.  Each  impostor  had  one  session  with  three  repetitions  of  four  phrases, 
each  of  the  16  words  occurring  once  in  the  four  phrase  set .  The  BISS-type  pre- 
processed  impostors  were  compared  against  reference  files  for  2SS  male  speakers 
enrolled  on  the  CIC  system  as  of  September  1978.  For  IPMOD2,  the  23  male  impos¬ 
tors  were  compared  against  71  male  references  and  the  9  female  impostors  were 
compared  against  14  female  references,  where  the  references  were  as  of  3  Janu¬ 
ary,  1980  and  the  true  speaker  data  were  from  the  CIC  booth  usage  for  the  previ¬ 
ous  6  weeks. 

6.  The  final  data  set  used  was  just  the  CIC  reference  files  as  of  25  Oc¬ 
tober  1979.  The  only  purpose  of  this  was  to  calculate  various  attributes  of 
the  reference  files  for  the  speakers  that  might  have  shown  some  correlation  with 
the  equal  error  rates  determined  for  the  CIC  data  in  the  prior  experiment. 
These  reference  files  as  of  25  October  1979  contained  references  for  193  speak¬ 
ers  (170  males,  23  females)  using  BISS-type  preprocessing  and  74  speakers  (61 
males,  13  females)  using  IPMOD2-type  preprocessing. 

II .  ANALYSIS 


Shown  in  Figure  1  is  the  general  form  for  the  true  speaker  (TYPE  I)  and  im¬ 
postor  (TYPE  II)  error  rates  as  a  function  of  distance  between  an  imput  and  a 
reference.  The  task  then  is  to  find  an  easily  measurable  parameter  which  has  a 
well  defined  relationship  to  the  error  rate  so  that  prompting  words  with  low 
equal  error  rates  can  be  chosen  without  performing  a  large  scale  experiment  with 
many  true  speaker  and  impostor  trials.  Since  all  six  of  the  data  bases  dis¬ 
cussed  above  have  the  average  true  speaker  distances  available  (or  at  least  the 
EHAT's),  that  was  the  first  parameter  tried.  Tables  of  the  normalized  average 
squared  distances  are  given  in  Tables  4-7  and  EERs  are  given  in  Tables  8-10. 
These  are  plotted  in  Figure  2.  (A  indicates  those  plots  with  sample  sizes 
judged  to  be  adequate.)  No  distinct  relationship  between  the  two  variables  can 
be  seen.  Considering  Figure  1,  this  is  an  expected  disappointment  since  in  this 
case  we  are  trying  to  find  a  relationship  between  the  average  of  a  distribution 
and  the  overlap  of  its  tail  with  another  distribution. 

The  same  problem  exists  using  the  "median"  of  the  true  speaker  distribu¬ 
tion,  as  shown  in  Figure  3,  a  plot  of  EER  vs  the  medians  of  the  true  speaker 
squared  distances  for  experiment  5.  Figures  4  and  5  show  plots  of  EER  vs 
equal  error  threshold  and  10th  percentile  of  the  true  speaker  squared  distances 
respectively.  Although  both  of  these  distribution  parameters  are  more  closely 
associated  with  the  tails  of  the  distributions,  the  only  improvements  are 
slightly  more  skewed  plots  in  Figure  4.  The  actual  true  speaker  and  impostor 
distributions  from  which  these  data  were  derived  are  shown  in  Figures  6-8. 
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The  data  used  to  plot  Figures  3-5  are  shown  in  tabular  for®  in  Tables 
11-13.  In  addition  to  the  true  speaker  distribution  parameters  and  the  equal 
error  thresholds,  these  tables  also  contain  three  additional  columns  showing 
data  associated  with  the  impostor  distributions.  The  first  of  these  columns  is 
the  10th  percentile  values  of  the  squared  distances  for  the  impostor  distribu¬ 
tions  for  experiment  5.  The  last  two  columns  were  derived  from  data  in  experi¬ 
ment  6.  Although  the  reference  files  are  not  identical  between  experiments  5 
and  6,  the  supposition  is  made  that  the  statistics  of  the  reference  files  are 
similar  since  many  of  the  enrolled  speakers  are  the  same  and  the  number  of 
speakers  is  non-negligible .  The  next  to  the  last  column  represents  the  30th 
percentile  of  the  squared  distances  between  the  reference  files  for  each  word 
and  the  average  reference  file  for  the  corresponding  word  across  all  speakers. 
The  final  column  gives  the  10th  percentile  of  the  squared  differences  between 
the  reference  file  for  each  speaker  for  a  given  word  and  the  reference  file  for 
every  other  speaker  for  the  same  word.  The  EER  obtained  from  experiment  5  is 
plotted  in  Figures  9-11  vs.  the  values  given  in  these  last  three  columns  of 
Tables  11-13.  Although  the  data  in  these  plots  are  still  scattered,  an  inverse 
relationship  between  the  EER  and  the  distance  between  reference  files  is  much 
more  evident  than  in  the  prior  data.  This  suggests  that  one  mechanism  that  could 
be  used  in  selecting  good  prompting  words  is  to  collect  single  session  data  from 
a  large  populaton  and  select  those  words  having  the  largest  value  at  say  the 
10th  percentile  of  the  distances  computed  between  all  pairs  of  patterns  for  each 
word.  (Note  that  although  not  presented  here,  the  averages  of  these  distances 
did  not  show  such  a  nice  relationship. ) 

A  final  approach  tried  was  to  try  to  find  some  inherent  property  of  the 
patterns  for  the  words  which  related  to  the  equal  error  rates.  Tables  14-16 
show  the  relationship  between  the  EERs  derived  from  experiment  5  and  the  fol¬ 
lowing  four  measures: 

1.  Place  of  articulation  of  the  vowel 
(high/low/medial  and  front/back/center) 

2.  Percentage  (P<I)>  of  each  quantized  energy  level,  I  (1-0, ...,7), 
across  all  filters 

3.  A  measure  of  the  average  density  across  all  reference  patterns 
for  a  given  word  (I*P(I)) 

4.  A  measure  of  the  squared  deviation  from  the  average  energy 
level  (P(I)  times  the  square  of  (1  +  the  integer  part  of  the 
magnitude  of  (3.5-1))). 

This  last  measure  was  prompted  by  the  idea  that  since  we  are  using  a  squared 
distance  measure  in  the  verification  algorithm,  a  bland  pattern  that  had  many 
filters  with  medial  values  would,  on  the  average,  have  lower  values  of  squared 
distances  for  impostors.  As  can  be  seen  from  the  tables,  none  of  these  measures 
seem  particularly  valuable. 

One  final  approach  was  to  try  to  relate  the  EERs  to  the  values  of  the  for¬ 
mant  locations  for  the  vowels  in  the  patterns.  This  is  done  in  Table  17,  which 
gives  the  center  frequencies  of  the  14  filters  and  the  center  frequencies  of  the 
formants  of  the  vowels  in  the  central  section  of  each  of  the  prompting  words. 
(Two  lines  are  used  in  the  table  for  initial  and  final  locations  for  di¬ 
phthongs.  )  Immediately  obvious  is  the  generally  increasing  EERs  as  the  values 
of  the  second  and  third  formants  increase.  This  may  be  due  to  the  lack  of  prop¬ 
er  amplitude  compensation  for  the  wider  bandwidth  top  three  filters  in  the  pre¬ 
processing  algorithm.  The  glaring  exception  to  this  trend  in  the  second  formant 
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is  for  the  words  containing  the  vowel  "Ml"  (called,  strong).  The  problem  in 
these  cases  becomes  obvious  by  investigating  some  actual  reference  patterns  for 
these  two  words.  The  gentle  roll-off  of  the  filters  causes  a  smearing  of  the 
energies  out  of  filters  2-4  such  that  an  impostor  does  not  encounter  such  dras¬ 
tic  energy  differences  as  the  locations  of  the  first  and  second  formants  vary, 
resulting  in  low  scanning  errors  for  these  words,  even  for  impostors. 

One  final  observation  can  be  made  from  the  EERs  in  the  *'ed  columns  of 
Tables  8-10.  This  observation  is  that,  for  the  same  vowel,  those  words  having  a 
prevocalic  "R"  tend  to  have  lower  relative  EERs  than  those  without  the  formant 
movement  caused  by  the  adjacent  "R". 

III.  CONCLUSIONS 


Based  upon  the  above  sets  of  experiments,  the  following  guidelines  should 
be  used  in  the  selection  of  sets  of  monosyllabic  prompting  words  for  voice  au¬ 
thentication  systems: 

1.  One  session  of  all  candidate  words  should  be  collected  from 
several  hundred  speaker  and  those  words  having  low  relative 
between  speaker  squared  distances  at  say  the  10th  percentile 
should  be  excluded. 

2.  For  the  BISS*  filter  bank,  words  should  be  excluded  whose 
central  vowel  has  either  of  the  following  properties: 

A.  Third  formant  above  2700  Hz  and  second  formant  above  2000  Bz 
(El  as  in  "CAME"  and  EE  as  in  "JEAN") 

B.  Close  first  and  second  formants  throughtout  the  duration 
of  the  vowel  (AW  as  in  "called"  or  "strong”). 

3.  Words  having  prevocalic  "R’s"  should  be  favored  over  those 
without . 

4.  Words  having  little  preceptual  difference  should  not  both  be 
used  (such  as  "RUN"  and  "ROOM”),  since  it  has  been  observed 
from  listening  to  a  small  sample  of  the  tapes  of  true  speaker 
trials  from  the  MITRE  phase  II  experiments,  that  speakers 
sometimes  confuse  such  pair  of  words. 

In  addition  to  these  rules,  the  adjective-noun-verb-adverb  paradigm  was 
found  to  be  preferable  to  the  adjective-noun-adjective-noun  paradigm  used  on 
BISS,  also  concluded  from  the  MITRE  tapes,  which  showed  a  non-negligible  number 
of  trials  in  which  the  subject  paused  so  long  between  the  middle  two  words  that 
the  system  thought  the  speaker  had  finished  speaking. 

One  additional  rule  in  choosing  word  sets  is  prompted  by  the  reasonable 
sounding  conjecture  that  since  initial  reference  point  determination  during  en¬ 
rollment  is  made  based  upon  the  location  of  energy  peaks,  the  elimination  of 
between  word  nasal/semivowel/glide  bridges  would  minimize  the  energy-based, 
syllable  detection  problem  during  enrollment. 


*  The  "BISS"  FILTER  BANK  is  as  defined  in  Table  I,  page  20,  of  "SPEAKER  VERIFI¬ 
CATION,"  RADC-TR-7 4-179,  April  1974.  These  filters  have  constant  bandwidths  of 
220  Hz  and  constant  spacing  between  center  frequencies  of  179  Hz.  Note  that  the 
top  two  filters  (C.F.'s  OF  2805  AND  2980  Hz)  are  not  used  in  either  the 
BISS-type  or  the  IPM0D2-type  preprocessing. 


finally,  due  to  the  importance  of  this  type  of  investigation  in  terms  of 
the  potential  for  reducing  error  rates,  this  summary  should  he  considered  only 
an  interim  report  in  a  continuing  investigation.  Some  of  the  tasks  in  this  con¬ 
tinuing  investigation  are  as  follows t 

1.  See  if  EERs  for  the  words  vary  with  the  choice  of  center 
frequencies  and  bandwidths  of  the  filter  bank . 

2.  Choose  some  substitute  words  for 

A.  Jean  (Jan,  June,  Joan) 

B.  called 

C.  Strong 

D.  Came . 

3.  Consider  word  set  modifications  that  eliminate  the  nasal 
bridge  that  can  exist  between  the  third  and  fourth  words 
of  word  set  "C" . 

4.  Consider  using  some  two-syllable  prompting  words 

A.  Advantage  -  Performance  improves  as  the  amount  of  speech 
data  increases 

B.  Disadvantage  -  Enrollment  algorithm  that  looks  for  four 
energy  peaks  would  have  to  be  modified. 

5.  After  choosing  several  candidate  substitute  words,  collect 
data  from  the  approximately  40  males  at  the  TI  Hillcrest 
location  and  perform  the  between  speaker  distance  measure 
experiment  proposed  at  the  beginning  of  this  section. 

6.  Consider  methods  for  quantitatively  measuring  the  perceptual 
difference  between  words  in  a  word  set. 

7.  Consider  allowing  present  tense  verbs  (either  partially  or 
totally)  in  place  of  the  past  tense  verbs  currently  used. 
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TABLE  4. NORMALIZED  AVERAGE  SQUARED  DISTANCE  FOR  TRUE  SPEAKERS  FROM 
THEIR  REFERENCES  FOR  MALES  WITH  BISS-TYPE  PREPROCESSING 

NORMALIZED  AVERACE  TRUE  SPEAKER  DISTANCES 


FOR 

.  DATA 

BASE  X  / 

WORD  SET  Y 

WORDS 

VOWEL 

1/A 

*2/B 

4/C 

5/C 

*6/C 

A 

C0 

C 

EE 

.78 

.87 

.89 

.71 

.74 

DEEP 

EAST 

JEAN 

IX 

.90 

.86 

.74 

.65 

.69 

TWIGS 

LIMB 

NEAR 

1.00 

8ING 

EH 

.84 

.74 

.82 

.79 

.81 

WEST 

WEST 

BEN 

AE 

.79 

.86 

1 .00 

1.00 

1 .00 

SANG 

CAMP 

SWAM 

AA 

.67 

.41 

.60 

.47 

.50 

STOPPED 

PLOT 

HARD 

AW 

.39 

.46 

.37 

.40 

.42 

SMALL 

LAWN 

CALLED 

.81 

.61 

.59 

STRONG 

UX 

.99 

1.00 

.95 

.92 

.96 

STOOD 

GOOD 

GOOD 

uu 

1.00 

.90 

.69 

.77 

.84 

COOL 

ROOM 

BRUCE 

UH 

.70 

.77 

.96 

.91 

.92 

BUGS 

RUN 

YOUNG 

IR 

.78 

.67 

.67 

.69 

.69 

BIRDS 

FIRST 

SERVED 

AU 

.73 

.60 

.86 

.80 

.83 

DOWN 

SOUTH 

PROUD 

AI 

.61 

.66 

.49 

.56 

.55 

WILD 

WIDE 

HIGH 

01 

.67 

.B8 

.86 

.86 

POINT 

JOYCE 

IU 

.87 

.82 

HUGE 

CUBE 

OU 

.79 

TOADS 

El 

.86 

.77 

.91 

.73 

.77 

STRANGE  GREAT 

CAME 

OX 

.71 

.75 

.63 

.65 

NORTH 

NORTH 

MAX. 

125 

164 

118 

139 

125 

AVE. 

91.4 

99.9 

92.0 

TABLE  B. NORMALIZED  AVERAGE  SQUARED  DISTANCE  FOR  TRUE  SPEAKERS  FROM 
THEIR  REFERENCES  FOR  FEMALES  WITH  BISS-TYPE  PREPROCESSING 


NORMALIZED  AVERACE  TRUE  SPEAKER  DISTANCES 


FOR  DATA  BASE  X 

/  WORD 

SET  Y 

WORDS 

VOWEL 

*2/B 

4/C 

5/C 

*6/C 

B 

C 

EE 

.87 

1.00 

.94 

.94 

EAST 

JEAN 

IX 

.85 

.82 

.91 

.84 

LIMB 

NEAR 

EH 

.77 

.99 

.93 

.94 

WEST 

BEN 

AE 

.88 

.81 

.95 

.94 

CAMP 

SWAM 

AA 

.40 

.46 

.41 

.39 

PLOT 

HARD 

AW 

.46 

.31 

.34 

.35 

LAWN 

CALLED 

.71 

.82 

.85 

STRONG 

UX 

1.00 

.65 

.90 

.83 

COOD 

COOD 

UU 

.89 

.65 

.78 

.81 

ROOM 

BRUCE 

UH 

.75 

.95 

1.00 

1 .00 

RUN 

YOUNG 

IR 

.62 

.59 

.74 

.78 

FIRST 

SERVED 

AU 

.54 

.85 

.83 

.83 

SOUTH 

PROUD 

AI 

.67 

.74 

.60 

.58 

WIDE 

HIGH 

01 

.65 

.82 

.81 

.85 

POINT 

JOYCE 

IU 

.85 

Cure 

El 

.77 

.90 

.81 

.83 

GREAT 

CAME 

OX 

.64 

.65 

.54 

.59 

NORTH 

NORTH 

MAX. 

165 

141 

152 

142 

104.9 

116.9 

109.6 

*  EHATB 

(EXPECTED 

8CANNING 

ERRORS) 
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TABLE  6. FORMALIZED  AVERAGE  SQUARED  DISTANCE  FOR  TRUE 
SPEAKERS  FROM  THEIR  REFERENCES  FOR  MALES  WITH 
IPHOD2-TYPE  PREPROCESSING 

NORMALIZED  AVERACE  TRUE  SPEAKER 
DISTANCES  FOR 


DATA 

BASE  X  /  WORD 

SET  Y 

WORDS 

VOWEL 

3/B 

4/C 

5/C 

*6/C 

B 

C 

EE 

.81 

.81 

.79 

.71 

EAST 

JEAN 

IX 

.98 

.72 

.64 

.68 

LIMB 

NEAR 

EH 

.67 

.77 

.73 

.77 

WEST 

BEN 

AE 

1 .99 

1 .80 

1 .88 

1 .88 

CAMP 

SWAM 

AA 

.45 

.59 

.45 

.44 

PLOT 

HARD 

AW 

.43 

.34 

.48 

.38 

LAWN 

CALLED 

.86 

.58 

.53 

STRONG 

UX 

.72 

.79 

.96 

.95 

GOOD 

GOOD 

uu 

.82 

.55 

.78 

.73 

ROOM 

BRUCE 

UH 

.87 

.89 

.92 

.98 

RUN 

YOUNG 

IR 

.57 

.59 

.69 

.61 

FIRST 

SERVED 

AU 

.65 

.84 

.83 

.78 

SOUTH 

PROUD 

AI 

.66 

.51 

.51 

.53 

WIDE 

HIGH 

01 

.68 

.81 

.83 

.79 

POINT 

JOYCE 

IU 

.66 

CUBE 

El 

.65 

.93 

.87 

.79 

GREAT 

CAME 

OX 

.71 

.68 

.59 

.59 

NORTH 

NORTH 

MAX. 

153 

158 

173 

184 

108.9 

125.1 

127.7 

TABLE  7. NORMALIZED  AVERACE  SQUARED  DISTANCE  FOR  TRUE 
SPEAKERS  FROM  THEIR  REFERENCES  FOR  FEMALES 
WITH  IPM0D2-TYPE  PREPROCESSING 


NORMALIZED  AVERAGE  TRUE  SPEAKER 
DISTANCES  FOR 


DATA 

BASE  X 

/  WORD 

SET  Y 

WORDS 

VOWEL 

3/B 

4/C 

5/C 

*6/C 

B 

C 

EE 

.39 

.78 

.94 

.85 

EAST 

JEAN 

IX 

.78 

.66 

.79 

.81 

LIMB 

NEAR 

EH 

.62 

.85 

.87 

.86 

WEST 

BEN 

AE 

.61 

.81 

1.88 

.98 

CAMP 

SWAM 

AA 

.34 

.42 

.34 

.34 

PLOT 

HARD 

AW 

.44 

.27 

.38 

.28 

LAWN 

CALLED 

.63 

.68 

.73 

STRONG 

UX 

.47 

.52 

.77 

.76 

GOOD 

GOOD 

UU 

.74 

.44 

.67 

.71 

ROOM 

BRUCE 

UH 

1 .88 

.87 

.95 

1  .88 

RUN 

YOUNC 

IR 

.54 

.48 

.64 

.69 

FIRST 

SERVED 

AU 

.67 

1.88 

.82 

.81 

SOUTH 

PROUD 

AI 

.51 

.77 

.76 

.61 

VIDE 

HIGH 

01 

.56 

.68 

.76 

.81 

POINT 

JOYCE 

IU 

.42 

.49 

CUBE 

El 

.57 

.86 

.72 

.77 

CREAT 

CAME 

OX 

.75 

.62 

.49 

.54 

NORTH 

NORTH 

MAX. 

241 

189 

284 

218 

133.2 

146.6 

157.4 

*  EHATS  (EXPECTED  SCANNING  ERRORS) 
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TABLE  B. EQUAL  ERROR  RATE  AS  FUNCTION  OF  WORD  FOR  SEVERAL 
DATA  BASES  FOR  HALES  A  FEMALES  WITH  BISS -TYPE 
PREPROCESSING 

EQUAL  ERROR  RATE  FOR 


DATA 

BASE  X 

/  WORD 

SET  Y 

WORDS 

VOWEL 

•I/A/M 

4/C/M 

4/C/F 

A5/C/M 

A 

C 

EE 

.229 

.07 

.12 

.171 

DEEP 

JEAN 

IX 

.150 

.06 

.07 

.160 

TWIGS 

REAR 

.178 

SING 

EH 

.125 

.05 

.065 

.136 

WEST 

BEN 

AE 

.169 

.08 

.09 

.133 

8ANC 

SWAM 

AA 

.140 

.06 

.185 

.141 

STOPPED  HARD 

AW 

.218 

.10 

.31 

.190 

SMALL 

CALLED 

.07 

.13 

.196 

STRONG 

UX 

.178 

.05 

.11 

.116 

STOOD 

GOOD 

uu 

.180 

.035 

.115 

.105 

COOL 

BRUCE 

UH 

.160 

.09 

.045 

.121 

BUGS 

YOUNG 

IR 

.185 

.065 

.08 

.137 

BIRDS 

SERVED 

AU 

.110 

.085 

.135 

.118 

DOWN 

PROUD 

A1 

.IBS 

.07 

.135 

.140 

WILD 

HIGH 

01 

.05 

.09 

.116 

JOYCE 

IU 

.165 

HUGE 

ou 

.130 

TOADS 

El 

.115 

.055 

.08 

.161 

STRANGE 

CAME 

ox 

.055 

.11 

.123 

NORTH 

TABLE  9. EQUAL  ERROR  RATE  AS  FUNCTION  OF  WORD 
FOR  SEVERAL  DATA  BASES  FOR  MALES  WITH 
IPM0D2-TYPE  PREPROCESSING 

EQUAL  ERROR  RATE  FOR 
DATA  BASE  X  /WORD  SET  Y  WORDS 


VOWEL 

*3/B 

4/C 

«5/C 

B 

C 

EE 

.318 

.053 

.173 

EAST 

JEAN 

IX 

.297 

.063 

.135 

LIMB 

NEAR 

EH 

.244 

.045 

.098 

WEST 

BEN 

AE 

.205 

.086 

.104 

CAMP 

SWAM 

AA 

.242 

.050 

.114 

PLOT 

HARD 

AW 

.269 

.083 

.160 

LAWN 

CALLED 

.053 

.188 

STRONG 

UX 

.285 

.054 

.088 

GOOD 

COOD 

UU 

.254 

.022 

.089 

ROOM 

BRUCE 

UH 

.247 

.081 

.128 

RUN 

YOUNG 

IR 

.189 

.067 

.084 

FIRST 

SERVED 

AU 

.230 

.075 

.091 

SOUTH 

PROUD 

AI 

.246 

.072 

.121 

WIDE 

HIGH 

01 

.286 

.049 

.103 

POINT 

JOYCE 

IU 

.285 

.336 

CUBE 

El 

.256 

.042 

.188 

GREAT 

CAME 

OX 

.233 

.063 

.076 

NORTH 

NORTH 

«  ADEQUATE  (OR  AT  LEAST  ROT  INADEQUATE)  SAMPLE  SIZE 
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TABLE  1*. EQUAL  ERROR  RATE  AS  FUNCTION  OF  WORD 
FOR  SEVERAL  DATA  BASES  FOR  FEMALES 
WITH  IPM0D2-TYPE  PREPROCESSING 


EQUAL  ERROR  RATE  FOR 
DATA  BASE  X  /WORD  SET  7 

WORDS 

VOWEL 

*3/B 

4/C 

*8/C 

B 

C 

EE 

.389 

.111 

.179 

EAST 

JEAN 

IX 

.383 

.078 

.123 

LIMB 

NEAR 

EH 

.296 

.046 

.168 

WEST 

BEN 

AE 

.176 

.087 

.212 

CAMP 

SWAM 

AA 

.238 

.163 

.149 

PLOT 

HARD 

AW 

.294 

.280 

.192 

LAWN 

called 

.129 

.203 

STRONG 

UX 

.383 

.100 

.189 

GOOD 

GOOD 

UU 

.471 

.128 

.078 

ROOM 

BRUCE 

UH 

.309 

.042 

.099 

RUN 

YOUNG 

IR 

.294 

.088 

.102 

FIRST 

SERVED 

AU 

.278 

.119 

.134 

SOUTH 

PROUD 

AI 

.188 

.144 

.162 

WIDE 

H1CH 

OI 

.188 

.088 

.108 

POINT 

JOYCE 

IU 

.276 

.278 

CUBE 

El 

.278 

.083 

.206 

GREAT 

CAME 

OX 

.418 

.068 

.132 

NORTH 

NORTH 

TABLE  11. TYPE  I/TYPE  II  DISTRIBUTION  PARAMETERS  FOR  DATA  BASE  8 

FOR  MALES  WITH  BISS-TYPE  PREPROCESSING 

REF  FILE  AVE. 

TRUE  SPEAKS!  DIST.  EQUAL  IMP.  DIST.  REF  FILE 

-  ERROR  DIST.  FROM  AVE  DIST. 

VOWEL  EER  AVE  MEDIAN  10XTILE  THRESH  10XTILE  30XTILE  10 STILE  WORD 

UU 

.108 

.77 

.80 

.77 

.83 

.89 

.83 

.88 

BRUCE 

UX 

.116 

.92 

.94 

.92 

.97 

1.00 

.77 

.74 

GOOD 

OI 

.116 

.86 

.86 

.84 

.88 

.92 

1  .00 

1.00 

JOYCE 

AU 

.118 

.80 

.79 

.83 

.86 

.89 

.81 

.81 

PROUD 

UH 

.121 

.91 

.93 

.88 

.91 

.94 

.66 

.69 

YOUNC 

OX 

.123 

.63 

.61 

.64 

.67 

.67 

.70 

.70 

NORTH 

AE 

.133 

1 .00 

1 .00 

1 .00 

1.00 

.99 

.98 

.98 

SWAM 

EH 

.136 

.79 

.76 

.83 

.81 

.79 

.66 

.62 

BEN 

IR 

.137 

.69 

.68 

.73 

.72 

.69 

.83 

.84 

SERVED 

AI 

.140 

.86 

.81 

.61 

.88 

.84 

.73 

.86 

HIGH 

AA 

.141 

.47 

.48 

.80 

.48 

.46 

.43 

.40 

HARD 

IX 

.161 

.68 

.61 

.69 

.63 

.87 

.74 

.67 

NEAR 

El 

.161 

.73 

.70 

.77 

.72 

.66 

.60 

.82 

CAME 

EE 

.171 

.71 

.69 

.76 

.68 

.61 

.70 

.87 

JEAN 

AW 

.190 

.40 

.38 

.44 

.37 

.30 

.32 

.30 

CALLED 

AW 

.196 

.61 

.87 

.64 

.88 

.46 

.34 

.39 

STRONG 

RAX. 

139 

122 

228 

206 

189 

181 

218 

*  ADEQUATE  ((HI  AT  LEAST  NOT  INADEQUATE)  SAMPLE  SIZE 
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TABLE  12. TYPE  I /TYPE  II  DISTRIBUTION  PARAMETERS  FOR  DATA  BASE  5 
FOR  MALES  WITH  IPM0D2-TYPE  PREPROCESSING 

REF  FILE  AVE. 

TRUE  SPEAKER  DIST.  EQUAL  IMP.  DIST.  REF  FILE 
-  ERROR  DIST.  FROM  AVE  DIST. 


VOWEL 

EER 

AVE 

MEDIAN 

10KTILE  THRESH 

10XTILE 

30XT1LE 

10XTILE 

WORD 

OX 

.076 

.89 

.88 

.60 

.66 

.71 

.75 

.75 

NORTH 

IR 

.084 

.69 

.70 

.69 

.75 

.77 

.60 

.61 

SERVED 

ux 

.088 

.96 

.97 

.95 

1.00 

1  .00 

.86 

.79 

GOOD 

uu 

.089 

.78 

.80 

.77 

.81 

.80 

.98 

1.00 

BRUCE 

AU 

.091 

.83 

.82 

.90 

.95 

.94 

1 .00 

.95 

PROUD 

EH 

.098 

.73 

.69 

.83 

.86 

.83 

.77 

.67 

BEN 

01 

.103 

.83 

.84 

.83 

.84 

.78 

.91 

.88 

JOYCE 

AE 

.104 

1  .00 

1 .00 

1.00 

1  .00 

.93 

.99 

1  .00 

SWAM 

AA 

.114 

.48 

.42 

.82 

.81 

.43 

.46 

.47 

HARD 

AI 

.121 

.81 

.48 

.60 

.58 

.48 

.67 

.56 

HIGH 

UH 

.128 

.92 

.93 

.93 

.89 

.74 

.68 

.71 

YOUNG 

IX 

.138 

.64 

.60 

.69 

.63 

.51 

.77 

.66 

NEAR 

AW 

.160 

.40 

.36 

.45 

.38 

.27 

.35 

.28 

CALLED 

EE 

.173 

.79 

.78 

.84 

.72 

.51 

.56 

.53 

JEAN 

AW 

.188 

.88 

.88 

.63 

.84 

.36 

.34 

.37 

STRONG 

El 

.188 

.87 

.86 

.94 

.78 

.55 

.58 

.85 

CAME 

MAX. 

173 

184 

278 

269 

286 

173 

283 

TABLE  13. TYPE  I/TYPE  II  DISTRIBUTION  PARAMETERS  FOR  DATA  BASE  8 
FOR  FEMALES  WITH  IPM0D2-TYPE  PREPROCESSING 

REF  FILE  AVE. 

TRUE  SPEAKER  DIST.  EQUAL  IMP.  DIST.  REF  FILE 
-  ERROR  DIST.  FROM  AVE  DIST. 


VOWEL 

EER 

AVE 

MEDIAN 

10XTILE 

THRESH 

10XT1LE 

303TILE 

10XTILE 

WORD 

UU 

.078 

.67 

.69 

.66 

.73 

.77 

.65 

.58 

BRUCE 

UH 

.099 

.95 

.97 

.95 

1 .00 

1 .00 

.88 

.77 

YOUNC 

1R 

.102 

.64 

.60 

.69 

.72 

.70 

.42 

.46 

SERVED 

01 

.108 

.76 

.74 

.77 

.79 

.76 

1.00 

.92 

JOYCE 

IX 

.123 

.79 

.77 

.83 

.83 

.80 

.78 

.88 

NEAR 

OX 

.132 

.49 

.46 

.86 

.54 

.45 

.65 

.75 

NORTH 

AU 

.134 

.82 

.80 

.85 

.83 

.75 

.70 

.89 

PROUD 

AA 

.149 

.34 

.29 

.40 

.37 

.21 

.37 

.24 

HARD 

UX 

.189 

.77 

.74 

.84 

.76 

.67 

.37 

.44 

GOOD 

AI 

.162 

,76 

.68 

.92 

.81 

.66 

.66 

.47 

HICH 

EH 

.165 

.87 

.85 

.92 

.86 

.67 

.97 

1  .00 

BEN 

EE 

.179 

.94 

.96 

.92 

.84 

.72 

54 

.54 

JEAN 

AW 

.192 

.30 

.24 

.39 

.29 

.19 

.29 

.18 

CALLED 

AW 

.203 

.68 

.60 

.85 

.66 

.45 

.53 

.54 

STRONG 

El 

.206 

.72 

.64 

.88 

.68 

.32 

.65 

.81 

CAME 

AE 

.212 

1.00 

1.00 

1.00 

.88 

.67 

.95 

.80 

SWAM 

MAX. 

204 

189 

315 

299 

303 

172 

295 

171 


TABLE  14. PARAMETERS  DERIVED  FROM  REFERERCE  PATTERNS  FOR  HALES  WITH  BISS-TYPE 
PREPROCESS  IRC  FROM  DATA  BASE  6. 


VOWL 

TYP 

EER 

X  0 

X  1 

X  2 

X  3 

X  4 

X  5 

X  6 

X  7 

I»P<!> 

«* 

DU 

HB 

.105 

0.30 

0.14 

0.11 

0.10 

0.11 

0.10 

0.08 

0.06 

2.45 

8.75 

UX 

HB 

.116 

0.29 

0.19 

0.12 

0.10 

0.10 

0.08 

0.08 

0.05 

2.32 

8.79 

01 

MB/HF 

.  116 

0.28 

0.15 

0.12 

0.11 

0.12 

0.10 

0.09 

0.04 

2.49 

8.23 

AU 

LB/HB 

.118 

0.29 

0.14 

0.11 

0.11 

0.09 

0.09 

0.12 

0.05 

2.57 

8.73 

UH 

MC 

.  121 

0.27 

0.19 

0.13 

0.11 

0.10 

0.09 

0.08 

0.04 

2.37 

8.41 

OX 

MB 

.  123 

0.28 

0.10 

0.11 

0.17 

0.13 

0.08 

0.08 

0.06 

2.60 

8.03 

AE 

LF 

.  133 

0.22 

0.20 

0.15 

0.13 

0.11 

0.08 

0.07 

0.03 

2.37 

7.58 

EH 

MF 

.  136 

0.30 

0.21 

0.13 

0.10 

0.08 

0.05 

0.06 

0.06 

2.16 

9.10 

IR 

HC 

.137 

0.30 

0.14 

0.11 

0.10 

0.10 

0.10 

0.08 

0.06 

2.47 

8.92 

A I 

LB/HF 

.140 

0.36 

0.10 

0.07 

0.06 

0.09 

0.09 

0.15 

0.08 

2.67 

10.11 

AA 

LB 

.  141 

0.34 

0.08 

0.09 

0.10 

0.12 

0.10 

0. 10 

0.09 

2.73 

9.36 

IX 

HF 

.160 

0.31 

0.16 

0.13 

0.12 

0.08 

0.06 

0.0? 

0.07 

2.33 

9.12 

El 

MF/HF 

.161 

0.23 

0.20 

0.14 

0.14 

0.12 

0.07 

0.06 

0.05 

2.40 

7.78 

EE 

HF 

.  171 

0.19 

0.21 

0.16 

0.13 

0.11 

0.09 

0.06 

0.05 

2.52 

7.52 

AW 

MB 

.190 

0.34 

0.08 

0.09 

0.10 

0.10 

0.08 

0.11 

0.10 

2.72 

9.62 

AW 

MB 

.  196 

0.34 

0.09 

0.09 

0.09 

0.11 

0.09 

0.10 

0.08 

2.61 

9.40 

TABLE  15. PARAMETERS  DERIVED  FROM  REFERERCE  PATTERNS  FOR  MALES  WITH  I PMOD2-TYPE 
PREPROCESSING  FROM  DATA  BASE  6. 


VOWL 

TYP 

EER 

X  0 

X  1 

X  2 

X  3 

X  4 

X  5 

X  6 

X  7 

1*P< 1 ) 

** 

OX 

MB 

.076 

0.25 

0.10 

0.11 

0.15 

0.16 

0.09 

0.07 

0.06 

2.71 

7.70 

IR 

HC 

.084 

0.31 

0.  14 

0.11 

0.10 

0.09 

0.09 

0.09 

0.08 

2.04 

9.29 

UX 

HB 

.088 

0.28 

0.16 

0.12 

0.10 

0.10 

0.09 

0.08 

0.05 

2.46 

8.63 

uu 

HB 

.089 

0.29 

0.14 

0.11 

0.10 

0.10 

0.10 

0.09 

0.06 

2.57 

8.78 

AU 

LB/HB 

.091 

0.31 

0.12 

0.10 

0.09 

0.09 

0.10 

0.11 

0.08 

2.69 

9.37 

EH 

MF 

.098 

0.29 

0.19 

0.12 

0.10 

0.10 

0.06 

0.06 

0.07 

2.33 

8.98 

01 

MB/HF 

.103 

0.26 

0.10 

0.12 

0.11 

0.10 

0.10 

0.10 

0.05 

2.59 

8.42 

AE 

LF 

.104 

0.23 

0.19 

0.14 

0.11 

0.11 

0.09 

0.07 

0.05 

2.00 

8.03 

AA 

LB 

.114 

0.32 

0.08 

0.08 

0.10 

0.13 

0.10 

0.10 

0.09 

2.79 

9.13 

AI 

LB/HF 

.121 

0.37 

0.10 

0.06 

0.05 

0.07 

0.10 

0.14 

0.10 

2.74 

10.43 

UH 

MC 

.128 

0.2B 

0.17 

0.12 

0.11 

0.11 

0.09 

0.07 

0.06 

2.46 

8.63 

IX 

HF 

.  135 

0.28 

0.15 

0.13 

0.13 

0.10 

0.06 

0.07 

0.08 

2.49 

8.70 

AW 

MB 

.160 

0.30 

0.08 

0.10 

0.12 

0.14 

0.08 

0.10 

0.09 

2.80 

8.73 

EE 

HF 

.  173 

0.17 

0.20 

0.  14 

0.12 

0.12 

0.13 

0.06 

0.05 

2.73 

7.26 

El 

MF/HF 

.  188 

0.22 

0.18 

0.13 

0.13 

0.12 

0.10 

0.06 

0.06 

2.58 

7.81 

AW 

MB 

.  188 

0.33 

0.09 

0.08 

0.09 

0.10 

0.11 

0.07 

0.09 

2.66 

9.11 

TABLE  16. 

•PARAMETERS  DERIVED 

FROM  REFERERCE  PATTERNS 

FOR  FEMALES 

WITH  I PMOD2-TY1 

PREPROCESS IRC 

FROM 

DATA  BASE  6. 

VOWL 

TYP 

EER 

X  0 

X  1 

X  2 

X  3 

X  4 

X  5 

X  6 

X  7 

I*P(I) 

** 

UU 

HB 

.078 

0.25 

0.17 

0.12 

0.11 

0.10 

0.10 

0.09 

0.05 

2.55 

8.33 

UH 

MC 

.099 

0.22 

0.20 

0.  14 

0.11 

0.10 

0.10 

0.10 

0.03 

2.52 

7.91 

IR 

HC 

.102 

0.30 

0.14 

0.10 

0.11 

0.08 

0.08 

0.13 

0.06 

2.08 

9.06 

01 

MB/HF 

.108 

0.23 

0.13 

0.14 

0.14 

0.13 

0.10 

0.08 

0.04 

2.62 

7.51 

IX 

HF 

.  123 

0.21 

0.17 

0.12 

0.12 

0.13 

0.13 

0.09 

0.03 

2.71 

7.43 

OX 

MB 

.132 

0.27 

0.11 

0.10 

0.13 

0.16 

0.09 

0.09 

0.07 

2.73 

8.18 

AU 

LB/HB 

.134 

0.29 

0.15 

0.10 

0.10 

0.10 

0.09 

0.12 

0.05 

2.56 

8.80 

AA 

LB 

.149 

0.37 

0.10 

0.05 

0.08 

0.07 

0.13 

0.12 

0.09 

2.70 

10.19 

UX 

HB 

.159 

0.25 

0.19 

0.12 

0.11 

0.09 

0.09 

0.10 

0.05 

2.49 

8.41 

AI 

LB/HF 

.162 

0.31 

0.15 

0.08 

0.06 

0.10 

0.08 

0.18 

0.04 

2.66 

9.43 

EH 

MF 

.165 

0.20 

0.14 

0.14 

0.13 

0.15 

0.13 

0.09 

0.03 

2.80 

6.97 

EE 

HF 

.179 

0.12 

0.10 

0.14 

0.14 

0.21 

0.18 

0.05 

0.02 

3.02 

5.50 

AW 

HB 

.192 

0.39 

0.07 

0.05 

0.07 

0.04 

0.13 

0.14 

0.10 

2.74 

10.68 

AW 

MB 

.203 

0.30 

0.15 

0.08 

0.07 

0.10 

0.11 

0.15 

0.04 

2.62 

9.08 

El 

MF/HF 

.206 

0.13 

0.13 

0.13 

0.18 

0.18 

0.13 

0.10 

0.02 

3.03 

5.87 

AE 

LF 

.212 

0.19 

0.19 

0.14 

0.10 

0.12 

0.12 

0.13 

0.03 

2.76 

7.51 

**  tP<I)*<l+IHT(3.5-I> )**21 
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Figure  5  EER  vs  10th  Percentile  of  True  Speaker  Squared  Distances  for  CIC 
Data  Set  (5)  Conditional  on  Word  (16  dots) 
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Figure  6  C1C  Error  Distributions  by  Word  for  Males,  BISS  Preprocessing 
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Figure  9  EER  vs  10th  Percentile  of  Impostor  Square  Distances  for  CIC  Data 
(Set  5)  Conditional  on  Word  (16  dots) 


Figure  10  EER  for  CIC  Data  (Set  5)  vs  30th  Percentile  of  Distances  between 


Reference  Files  and  Average  Reference  File  for  Each  Word  (16  dots) 
Across  all  Speakers  for  CIC  Data  (set  6) 


Figure  11  EER  for  CIC  Data  (Set  5)  vs  10th  Percentile  of  Distances  Between 
all  Reference  Pattern  Pairs  for  Each  Word  (16  dots)  for  CIC  Data 
(Set  6) 
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Appendix  III 

ENROLLMENT  AND  VERIFICATION  ALGORITHMS 


For  the  algorithms  described  in  this  section  one  of  32  unique 
phrases  is  prompted  through  the  prompting  unit  to  the  entrant  who  in 
turn  repeats  the  phrase  into  the  microphone.  The  data  from  the 
microphone  is  filtered  and  compressed  and  then  processed  to  find 
either  peaks  in  speech  energy  or  valleys  in  scanning  errors  (where 
the  scanning  error  is  the  difference  between  the  expected  and  the 
received  speech  patterns).  From  the  peaks  or  valleys  an  optimum 
sequence  of  4  peaks  or  valleys  is  found  and  used  to  create  or  update 
reference  file  data  (i.e.,  expected  data)  and,  in  the  case  of 
verification,  determine  if  the  entrant  is  verified  or  not  verified. 
This  prompting  of  phrases  and  processing  of  the  results  is  repeated 
until,  in  the  case  of  enrollment,  a  suitable  reference  file  is 
created,  or,  in  the  case  of  verification,  a  pass/fail  decision  is 
made  for  the  entrant.  The  following  are  some  of  the  data  which 
shall  be  required  for  these  algorithms.  Subsequent  sections 
describe  the  enrollment  algorithm,  10.2;  the  verification  algorithm 
10.3;  the  training  mode,  10.4;  and  the  validation  mode,  10.5. 

10. 1  Data  requirements 

10.1.1  Authorization  file.  The  authorization  file  contains  the 
individual's  personal  data  (name,  rank,  organization,  identification 
(ID)  number  and  personal  code)  as  well  as  information  pertinent  to 
the  operation  of  the  ECS  (such  as  sites  to  which  access  is  allowed, 
times/days  of  authorized  entry,  escort  privilege).  This  file  is 
created  at  the  time  of  enrollment  and  is  used  during  verification  to 
determine  if  the  verification  process  should  take  place  based  on  the 
authorization  afforded  the  individual  associated  with  the  ID  number 
imbedded  in  the  badge. 

10.1.2  Speech  reference  file.  The  speech  reference  file  contains 
data  pertinent  to  the  speech  attributes  of  the  individual  and 
consists  of  expected  energy  patterns,  time  intervals  and  scanning 
errors  for  each  of  the  possible  words  used  to  compose  a  phrase.  A 
detailed  description  of  how  these  data  are  obtained  and  used  by  the 
system  is  provided  below. 

10.1.3  Prompt  data.  The  requirements  for  the  generation  of  phrases 
to  be  prompted  through  the  prompting  unit  are  described  herein. 

10.1.3.1  Prompt  data  requirements. 

a.  Phrases  to  be  prompted  to  the  individual  for  the  purpose  of 
creating  a  speech  reference  file  and  for  the  purpose  of  verifying 
against  said  reference  file  data  shall  consist  of  phrases  listed  in 
Table  V. 
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TABLE  V 

PROMPTED  PHRASES 


Phrase  Group 


Number 

Number 

Phrase 

Word 

Numbers 

(m) 

1 

1 

north  lawn  great  camp 

1. 

5, 

9, 

13 

9 

1 

south  limb  wide  point 

2, 

6, 

10, 

14 

17 

1 

east  run  good  plot 

3, 

7, 

11, 

15 

25 

1 

first  room  west  cube 

4, 

8, 

12, 

16 

2 

2 

north  lawn  great  cube 

1, 

5, 

9, 

16 

10 

2 

south  limb  wide  plot 

2, 

6, 

10, 

15 

18 

2 

east  run  good  point 

3, 

7, 

11, 

14 

26 

2 

first  room  west  camp 

4, 

8, 

12, 

13 

3 

3 

north  lawn  good  plot 

1, 

5, 

11, 

15 

11 

3 

south  limb  west  cube 

2, 

6, 

12, 

16 

19 

3 

east  run  great  camp 

3, 

7, 

9, 

13 

27 

3 

first  room  wide  point 

4, 

8, 

10, 

14 

4 

4 

north  lawn  good  point 

1, 

5, 

11, 

14 

12 

4 

south  limb  west  camp 

2, 

6, 

12, 

13 

20 

4 

east  run  great  cube 

3, 

7, 

9, 

16 

28 

4 

first  room  wide  plot 

4, 

8, 

10, 

15 

5 

5 

north  limb  wide  point 

1, 

6, 

10, 

14 

13 

5 

south  lawn  great  camp 

2, 

5, 

9, 

13 

21 

5 

east  room  west  cube 

3, 

8, 

12, 

16 

29 

5 

first  run  good  plot 

4, 

7, 

11, 

15 

6 

6 

north  limb  wide  plot 

1* 

6, 

10, 

15 

14 

6 

south  lawn  great  cube 

2, 

5, 

9, 

16 

22 

6 

east  room  west  camp 

3, 

8, 

12, 

13 

30 

6 

first  run  good  point 

4, 

7, 

11, 

14 

7 

7 

north  limb  west  cube 

1, 

6, 

12, 

16 

15 

7 

south  lawn  good  plot 

2, 

3, 

11, 

15 

23 

7 

east  room  wide  point 

3, 

8, 

10, 

14 

31 

7 

first  run  great  camp 

7, 

9, 

13 

8 

8 

north  limb  west  camp 

1, 

6, 

12, 

13 

16 

8 

south  lawn  good  point 

2, 

5, 

11, 

14 

24 

8 

east  room  wide  plot 

3, 

8, 

10, 

15 

32 

8 

first  run  great  cube 

A, 

7, 

9, 

16 
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b.  The  instructive  phrases  "louder  please"  and  "thank  you" 
shall  be  available  for  prompting  to  the  individual  under  conditions 
to  be  specified  below. 

c.  Other  phrases  deemed  necessary  to  make  the  system  more 
pleasant  may  be  added.  Some  examples  to  be  considered  are: 
"verified",  "not  verified",  "call  for  assistance",  "good  morning", 
etc. 

10.1.3.2  Phrase  Construction.  The  phrases  to  be  prompted  to  the 
individual  for  the  creation  of  speech  reference  file  data  and  for 
the  verification  against  said  reference  file  data  shall  be  limited 
to  the  32  phrases  of  Table  V.  These  32  phrases  consist  of  four 
words  each  (n  =  1,  2,  3,  4)  selected  from  a  total  of  16  possible 
words  (m  =  1 ,  2,... 16)  where  the  order  of  occurrence  of  the  four 
words  within  any  phrase  shall  be  fixed.  The  use  of  the  index  n  to 
represent  the  position  of  the  word  within  the  phrase  as  opposed  to  m 
which  represents  the  number  of  the  word  spoken  will  be  maintained 
throughout  this  appendix.  As  an  example,  the  word  "lawn"  is  word 
number  m=5  but  is  always  spoken  as  the  second  word  in  the  phrase, 
hence  n=2.  These  phrases  may  be  stored  either  as  32  four  word 
phrases  or  as  16  words  and  the  phrases  constructed  at  the  time  of 
prompting. 

10.1.3.3  Order  of  phrase  prompting.  The  four-word  phrases  of  Table 
V  shall  be  presented  to  the  individual  in  a  random  order  where  the 
manner  in  which  these  phrases  are  reordered  shall  be  as  follows: 

a.  Each  of  the  32  phrases  shall  be  assigned  to  the  group 
number  designated  in  Table  V.  Hence  each  group  will  consist  of  a 
unique  set  of  four  phrases. 

b.  Within  each  group,  the  four  phrases  shall  be  randomly 
reordered  such  that  each  phrase  appears  exactly  once. 

c.  The  groups  shall  be  randomly  reordered  such  that  each  group 
appears  exactly  once. 

d.  The  multiplicative  congruential  method  of  pseudo-random 
number  generation  shall  be  used  to  reorder  the  eight  groups  and  the 
four  phrases  within  each  group  (reference:  D.  Knuth,  "the  art  of 
computer  programming"  Vol  II). 

10.1.4  Preprocessing  data.  The  algorithms  described  herein  require 
that  some  form  of  preprocessing  be  done  to  the  signal  at  the  output 
of  the  microphone.  As  a  minimum  this  shall  include  all  of  the 
filtering  described  in  3. 7 . 1 . 1 . 1 . 2 . 2a (1)  and  may  encompass  some  or 
all  of  the  processing  described  in  10.2.2.2.1.  Therefore,  the 
preprocessing  function  shall  provide  filtered  data  (which  may 
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additionally  be  regressed,  normalized  and/or  quantized)  from 
fourteen  filters  every  centisecond  (10  milliseconds)  in  time  after 
the  completion  of  a  phrase  prompt.  The  preprocessing  function  shall 
also  make  available  a  filter  overload  indication  which  shall  be  set 
whenever  filter  saturation  occurs  and  which  shall  remain  set  until 
reset  as  indicated  within  the  algorithms.  Additionally,  the  gain  of 
the  filters  shall  be  adjustable  as  indicated  in  the  algorithms. 

10.1.5  Precision  requirements.  Unless  otherwise  specified,  the 
precision  requirements  indicated  herein  shall  be  maintained 
throughout  all  operations  in  the  algorithms  described.  The 
algorithms  are  specified  assuming  the  precision  afforded  by  a  16-bit 
computer  using  integer  arithmetic.  This  degree  of  precision  shall 
be  maintained  throughout. 

Unless  otherwise  noted  all  operations  may  be  truncated  to  the 
nearest  integer.  In  those  cases  where  rounding  shall  be  necessary  a 
notation  "rounded"  will  appear  in  the  margin.  In  some  instances 
significance  can  be  lost  if  intermediate  results  are  not  maintained 
to  the  double  precision  level  (i.e.  32-bit  integer).  These  cases 
will  be  denoted  by  "d.p."  in  the  margin  and  shall  be  evaluated  using 
enough  bits  such  that  neither  overflow  nor  underflow  occurs. 

10.2  Enrollment  algorithm.  The  enrollment  algorithm  shall  consist 
of  four  phases:  (1)  initialization,  (2)  generation  of  initial 
reference  data,  (3)  refinement  of  reference  data,  and  (4) 
termination.  Following  initialization,  phrases  are  selected  and 
prompted  such  that  speech  reference  file  data  is  generated  for  each 
of  the  16  possible  words  which  form  the  32  possible  phrases.  This 
initial  reference  file  data  is  then  refined  by  the  prompting  and 
processing  of  phrases  a  sufficient  number  of  times  such  that  each 
word  is  updated  a  minimum  of  four  times.  If  this  subsequent 
processing  reveals  that  the  initial  reference  file  data  was  not 
representative  of  the  individual  normal  speech,  the  words  involved 
will  be  reinitialized  and  the  refinement  process  repeated.  When  all 
words  have  been  sufficiently  refined,  the  resultant  reference  file 
data  will  be  saved  for  the  verification  attempts  which  the  entrant 
will  be  subsequently  undertaking. 

During  the  generation  and  refinement  of  the  reference  file  data,  the 
enrollment  operator  or  the  enrollee  shall  be  allowed  to  interrupt 
the  procedures  in  the  manner  described  in  10.2.5.  With  this 
exception,  the  enrollment  shall  proceed  in  the  sequence  described 
below.  Figure  6  is  a  flowchart  of  the  enrollment  algorithm  where 
the  paragraph  numbers  in  the  figure  correspond  to  those  in  the  text. 
The  notation  glossary  of  Table  VI  will  be  adhered  to  in  the 
following  description. 
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Figure  6.  ENROLLMENT  ALGORITHM 
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TABLE  VI 

ALGORITHM  NOTATION  GLOSSARY 


Symbol 

Usage 

Major  Reference 

OL 

Reference  file  updating  parameter 

10.3.3.1.3 

BESTSQ 

Optimum  sequence  parameter 

10.2.2.4, 

10.2.3.2.3.2.3 

c, ,  c2 

Regression  coefficients 

10.2.2.2.1.1 

e(L) 

Maximum  filter  energy  at  time  L 

10.2.2.2.1.1 

£ 

A  decision  score  parameter 

10.3.3.1.2 

EHAT 

Summed  expected  scanning  error 

10.3.3.1.1 

ep(j„) 

Energy  peak  j  for  the  nth  word 

10.2.2.2.2 

EPEAK 

Peak  energy 

10.2.2.2.3.1 

es  (L) 

Smoothed  energy  at  time  L 

10.2.2.2.2 

ESEjj.ESE^ 

Reference  data-expected  scanning 
error  for  word  m 

10.2.3.1,  10.2.4 
10.3.3.1.3 

eseq 

Sequence  error 

10.2.2.4, 

10.2.3.2.3.2.3 

ESMlNn(jn) 

Scanning  error  of  the  jth  valley 
point  for  the  nth  word 

10.2.3.2.3.1 

ETH»n>ETHmln) 

Initial  expected  relative  energy 
of  word  m 

10.2.2.4 

ETOT 

Total  energy  in  sequence  of  peaks 

10.2.2.4 

EUSE 

Summed  scanning  error 

10.3.3.1.1 

EW 

Point  pair  error  for  a  valley 
point  couplet 

10.2.3.2.3.2.1.2 

f(i) 

Output  from  filter  i 

10.2.2.2.1 

g(i) 

Regressed  output  from  filter  i 

10.2.2.2.1.1 
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Y,(i) , "^(i)  Regression  vectors  10.2.2.2.1.1 


h(i) 

Normalized  output  from  filter  i 

10.2.2.2.1.2 

i 

Filter  index  (1  <  i  <  14) 

10.2.2.2.1 

i 

Valley  sequence  index 

10.2.3.2.3.2 

IREG 

Registered  phrase  counter 

10.3.3.1.1 

j 

General  purpose  index 

U 

Index  to  the  jth  peak  or  valley 

10.2.2.4, 

for  the  nth  word 

10.2.3.2.3.2 

k 

Time  sample  index 

kI 

Index  to  the  kth  couplet  of 
valley  points  for  the  Ith 
and  (I+l)th  words 

10.2.3.2.3.2 

K 

Filter  quantization  parameter 
(1  <  K  <  7) 

10.2.2.2.1.3 

L 

Time  in  centiseconds  since 
end  of  prompt 

LTIME 

Parameter  used  to  locate  energy 
peaks 

10.2.2.2.2 

LTIMEyj 

Parameter  used  to  locate  valley 
points  for  the  nth  word 

10.2.3.2.3.1 

m,  m(n) 

Word  number  (1  <  m  <  16)  where  m(n) 
emphasizes  the  fact  that 
word  m  is  spoken  as  the  nth 
word  in  the  phrase 

MAX 

Parameter  used  to  locate  energy 
peaks 

10.2.2.2.2 

NAXCUT 

Expected  length  of  speech 

10.3.2.3 

NIN„ 

Parameter  used  to  locate  valley 
points  for  the  nth  word 

10.2.3.2.3.1 

MODE 

Parameter  used  to  locate  energy 
peaks 

10.2.2.2.2 
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HODEn 

Parameter  used  to  locate  valley 
points  for  the  nth  word 

10.2.3 

MXD 

Parameter  used  to  compute  point 
pair  errors 

10.2.3 

a 

Position  of  a  word  within  a 
phrase  (1  <  n  <  4) 

— 

Quantized  output  for  filter  i 
at  time  L 

10.2.2 

NOTREG 

Mis-registered  phrase  counter 

10.2.3 

NPK 

Energy  peak  counter 

10.2.2 

10.2.2 

NTCm, 

NTC»«(nl 

Number  of  times  reference  data 
for  word  m  has  been  calculated 

10.2.2 

10.2.3 

OPTSE  n 

Scanning  error  of  the  nth  valley 
point  or  energy  of  the  nth 
peak  in  the  optimum  sequence 

10.2.3 

0PTTn 

Time  of  the  nth  valley  point  or 
the  nth  peak  in  the 
optimum  sequence 

10.2.3 

PDEC 

Decision  score 

10.3.3 

PPE 

Scanning  errors  of  couplets  for 
the  valley  points  from  words 

I  and  1+1 

10.2.3 

PPEW 

Point  pair  errors  of  couplets  for 
the  valley  points  from  words 

I  and  1+1 

10.2.3 

PPT 

Times  of  couplet  for  the  valley 
points  from  words  I  and  1+1 

10.2.3 

Regression  vectors 

10.2.2 

yu.k) 

Quantization  thresholds 

10.2.2 

rw|(i*k) 

Reference  data  -  expected 

quantized  filter  i  output  for 

10.2.3 

10.3.3 

time  sample  k,  word  m 


.2.3.1 

.2.3.2. 1.2 


.2.1.3 

.3,  10.3.3.3 

.2.2, 

.4 

.5,  10.2.3.1, 
.4 

.2. 3. 2. 3 

.2. 3. 2. 3 

.1.2 

.2.3.2. 1.2 

.2.3.2. 1.2 

.2. 3. 2. 1.2 

.2.1.1 

.2.1.3 

.1,  10.2.4 
.1.3 
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SEn(L) 


Scanning  error  for  the  nth  word 
at  time  L 


sESE  Summed  expected  scanning  error  for 

a  phrase 

sESEyy)  Summed  expected  scanning  error  for 

word  m 


SERRf)  Parameter  used  to  compute  sequence 

error 

sr^Ci.k) ,  Summed  pattern  data  for  filter  i, 
srm^(i,k)  time  sample  k  and  word  m 


ST 


Parameter  used  to  compute  sequence 
error 


sATrefm, 

sATrefw(^ 


Summed  expected  time  interval 
between  word  m  prompted  as  the 
nth  word  and  the  (n+l)th  word 


SPEECH 

t 

AT 


Atew 


Start  of  speech  indicator 

General  purpose  parameter 
representing  time 

Time  interval  between  valley 
points 

Initial  expected  time  interval 
between  word  m  prompted  as  the 
nth  word  aad  the  (n+l)th  word 


tp  (j0) 


Time  of  the  jth  peak  for  the  nth 
word 


ATref^*,,  Reference  data  -  expected  time 

ATref^^  interval  between  word  m 

prompted  as  the  nth  word  and 
the  (n+l)th  word 


TSMIN^Cj)  Time  of  the  jth  valley  point 
for  the  nth  word 


Filter  normalization  parameter 


udtj  (n) 


Updating  time  interval  between 


10.2.3.2. 


10.3.2.1, 

10.3.3.1. 


10.2.2.5, 


10.2.2.4 

10.2.2.5, 

10.2.2.4 

10.2.2.5, 


10.2.2.2. 


10.2.3.2. 


10.2.2.4 

10.2.2.2. 

10.2.3.1, 

10.3.3.1. 


10.2.3.2. 


10.2.2.2. 


2 

1 

10.2.3.4 

10.2.3.4 

10.2.3.4 

3.1 

3. 2. 1.1 

2 

10.2.4, 

3 

3.1 

1.2 
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uptaj (i,k, 
use j (a) 
uwdj (n) 


x 


the  nth  and  (n+l)th  words  of 
the  jth  registered  phrase 

Updating  pattern  data  for  filter  i,  10.3.3.1.1 
time  sample  k  for  the  nth  word 
of  the  jth  registered  phrase 

Updating  scanning  error  data  for  10.3.3.1.1 
the  nth  word  of  the  jth 
registered  phrase 

Number  of  the  word  prompted  as  10.3.3.1.1 

the  nth  word  of  the  jth 
registered  phrase 

General  purpose  variable  - 


190 


. ~ 


SPECIFICATION  NUMBER 
BISS-ENC-14000 
15  May  1980 


10.2.1  Enrollment  initialization.  Initialization  shall  consist  of 
the  following: 

a.  The  enrollment  function  shall  obtain  a  unique 
identification  (ID)  number  for  the  enrollee  which  shall  associate 
the  authorization  file  with  the  speech  reference  file  (generated 
here)  through  the  coded  badge.  This  ID  number  shall  also  be 
available  to  the  enroller. 

b.  The  filter  gain  G,  3. 7. 1.1. 1.1. 3,  which  has  possible  values 
of  1,  2,  4,...  up  to  a  maximum  filter  gain  shall  be  initialized  to 
two  and  the  filter  overload  indicator  shall  be  cleared. 

c.  The  number  of  times  reference  data  has  been  calculated  for 
each  of  the  16  possible  words  shall  be  set  to  zero,  i.e.,  NTC*  =  0 
for  m  =  1 ,  2, . . . 16. 

d.  A  list  of  32  phrases  (eight  groups)  in  random  order  shall 
be  created  as  described  in  10.1.3.3.  When  the  actual  enrollment 
process  begins,  the  first  phrase  in  this  list  shall  be  the  phrase 
which  is  prompted  to  the  enrollee. 

Upon  completion  of  initialization  the  procedures  of  10.2.2  shall  be 
followed. 

10.2.2  Generation  of  initial  reference  file  data.  The  generation 
of  initial  reference  file  data  consists  of  locating  peaks  in  the 
speech  energy  and  selecting  from  those  peaks  four  peaks 
corresponding  to  the  four  spoken  words  such  that  the  time  separation 
between  peaks  falls  within  expected  intervals.  The  data  selected  to 
initialize  the  reference  file  is:  (1)  the  data  compressed  filter 
outputs  centered  at  the  time  of  the  peaks  and  (2)  the  time  intervals 
between  the  peaks.  This  process  is  repeated  until  data  has  been 
generated  in  this  manner  for  each  of  the  16  possible  words  used  to 
create  all  phrases. 

10.2.2.1  Phrase  prompt  initialization.  Prior  to  the  processing  of 
speech  data  for  a  given  prompted  phrase,  the  following 
initialization  shall  be  performed. 

a.  The  parameters  MAX  and  LTIME,  used  to  locate  peak  energies, 
shall  be  set  to  125  and  zero  respectively.  The  mode  used  to  locate 
peaks  shall  be  set  to  search-for-valley.  The  number  of  peaks  found, 
NPK,  shall  be  set  to  zero. 

b.  The  parameters  EPEAK  and  SPEECH  used  to  find  the  end  of 
speech  shall  be  set  to  zero  and  not-started  respectively. 
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c.  The  parameter  BESTSQ  used  to  locate  an  optimum  sequence 
of  peaks  shall  be  set  to  the  value  of  the  largest  positive  integer 
possible  for  the  computer. 

d.  The  phrase  indicated  by  previous  processing  shall  be 
prompted  through  the  prompting  unit. 

Upon  completion  of  this  initialization,  speech  processing  shall 
begin  as  per  10.2.2.2. 

10.2.2.2  Speech  processing.  For  each  centisecond  following  the  end 
of  prompting,  while  the  enrollee  is  repeating  the  prompted  phrase 
and  until  end-of-speech  is  declared,  10.2.2.2.3,  the  following 
procedures  shall  be  implemented. 

10.2.2.2.1  Data  compression.  Data  compression  consists  of 
accepting  the  filtered  data  3. 7. 1.1. 1.2. 2. a(l)  and  regressing, 
normalizing  and  quantizing  it  in  the  manner  described  herein.  The 
data  to  be  compressed  consist  of  one  data  point  f  (with  at  least  12 
bit  quantization)  from  each  of  fourteen  filters  and  occuring  every 
centisecond.  The  output  from  the  data  compression  shall  be  the 
maximum  filter  energy  e(L)  and  the  quantized  data  ir^(i,L)  for  i=l , 
2,... 14  at  time  L=l,2,...  in  centiseconds . 

10.2.2.2.1.1  Regression.  The  inputs  f(i)  for  i=l,2,...l4  shall  be 
regressed  to  eliminate  slope  and  curvature  of  the  spectrum  in  the 
following  manner. 

a.  If  any  one  of  the  14  inputs  is  less  than  one,  the 
corresponding  f(i)  shall  be  set  to  one. 

b.  Regression  coefficients  shall  be  computed  as  follows: 


14  f(i)*0, (i) 

C  =  7  +!T .  (d.p.) 

i=l  32768 

14  f Ci)*0  (i) 

C  =  7  +  21 .  (d.p.) 

i=l  32768 


where  (i)  and  0j(i)  shall  be  as  provided  in  Table  VI. 

c.  The  inputs  shall  be  regressed  as  follows: 


g(i) 


C*Y(i)  C*Y(i) 

f(i)  + . - .  - 

3Z768  32768 


(d.p.) 
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TABLE  VII 

REGRESSION  VECTORS 


i 

1 

-32767 

32767 

6085 

-7607 

2 

-27726 

17644 

5149 

-4096 

3 

-22685 

5041 

4212 

-1170 

4 

-17644 

-5040 

3276 

1169 

5 

-12602 

-12602 

2340 

2925 

6 

-7561 

-17644 

1403 

4095 

7 

-2520 

-20164 

467 

4681 

8 

2521 

-20164 

-468 

4681 

9 

7562 

-17644 

-1404 

4095 

10 

12603 

-12602 

-2341 

2925 

11 

17644 

-5040 

-3277 

1169 

12 

22685 

5041 

-4213 

-1170 

13 

27726 

17644 

-5149 

-4096 

14 

32767 

32767 

-6086 

-7607 
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for  i  =  1,  2, ...  14 

where  ^  (i)  and  ^(i)  shall  be  as  provided  in  Table  VI. 

d.  The  maximum  filter  energy  at  the  current  time  L  shall  be 
taken  as  the  maximum  of  the  fourteen  values  of  g(i)  and  will  be 
denoted  e(L),  i.e., 

e(L)  =  max  {g(i)} 

1  <  i  <  14 


10.2.2.2.1.2  Normalization.  The  regressed  values  g(i)  shall  be 
normalized  as  follows: 

h(i)  =  (32768  *  g  (i))/^,  (d.p.,  rounded) 

where 


14 

=  Us 

i=l 


(i) 


10.2.2.2.1.3  Quantization.  The  normalized  values  h(i)  shall  be 
quantized  as  follows: 

0  for  h(i)  <  (i , 1) 

1|(i,L)  =  -  K  for  Y (i,K)  <  h(i)  <  f  (i,K+l)  K  =  1,  2, ...6 

7  for  y  (i,7)  <  h(i) 

where  V^(i,K)  shall  be  as  provided  in  Table  VIII. 


10.2.2.2.2  Energy  peaks.  If  13  centiseconds  has  not  yet  elapsed 
since  the  end  of  prompting  (i.e.,  if  L  <  13),  the  end-of-speech  test 
as  per  10.2.2.2.3  shall  be  performed.  If,  however,  L  >  13,  the 
energy  peaks  of  the  smoothed  energy  function  shall  be  located.  The 
smoothed  energy  es(L)  at  time  L  shall  be  defined  as 


13 

es (L)  =  e (L- 13+K) 
K=1  2 


The  locating  of  peaks  in  this  function  shall  be  done  by  use  of  two 
modes  (search-for-valley  and  search-for-peak)  where  the  mode  shall 
be  that  which  was  determined  at  the  immediately  previous  time 
sample,  time=(L-l),  or  during  initialization  if  L=13.  Based  on  the 
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current  mode  the  values  to  which  the  parameters  MAX,  LTIME,  and  MODE 
shall  be  updated  for  the  next  centisecond  (at  L+l)  shall  be  as 
presented  in  Table  IX.  As  an  example,  if  es(L)  is  less  than  the 
current  value  of  MAX  and  the  current  mode  is  search-for-valley ,  MAX 
shall  be  set  to  es(L),  LTIME  shall  be  set  to  L  and  the  mode  for  the 
next  time  at  L+l  shall  remain  as  search-for-valley. 

If  the  current  mode  is  search-for-peak  and  (MAX*10)/12  is  greater 
than  or  equal  to  es(L)  then  a  peak  is  said  to  be  located.  In  this 
case,  prior  to  the  updating  indicated  in  the  table  and  then  only  if 
the  unupdated  value  of  MAX  is  greater  than  or  equal  to  150,  the  peak 
shall  be  saved.  If  the  peak  is  to  be  saved,  NPK  shall  be 
incremented  by  one  and  the  peak  value  and  location  shall  be  saved  in 
arrays  ep  and  tp  respectively  as  follows: 

ep(NPK)  =  MAX/50  (rounded) 

tp(NPK)  =  LTIME 

Provision  shall  be  made  for  up  to  90  such  peaks  being  found. 

Once  the  energy  peak  check  is  complete  for  the  current  time,  the 
end-of-speech  test  as  per  10.2.2.2.3  shall  be  performed. 

10.2.2.2.3  End-of-speech.  At  each  centisecond  in  the  speech 
processing  the  end-of-speech  test  described  herein  shall  be 
performed.  The  criterion  to  be  met  is  either  one  of  the  following 
two: 


(1)  one  hundred  centiseconds  of  valid  speech  exists  prior  to  a 
speech  silence; 

(2)  speech  has  continued  for  400  centiseconds  since  the  end  of 
prompting. 

The  testing  to  determine  if  one  of  these  two  criteria  is  met  shall 
be  as  described  in  10.2.2.2.3.1  and  10.2.2.2.3.2.  If  end-of-speech 
is  not  declared,  speech  processing  shall  continue.  If  end-of-speech 
is  declared,  subsequent  processing  shall  be  the  determination  of 
whether  the  data  is  suitable  (i.e.  10.2.2.3  if  generation  of  initial 
reference  file  data  is  being  implemented;  10.2.3.3  if  refinement  of 
reference  file  data  is  being  implemented;  or  10.3.3  if  a 
verification  is  being  implemented). 

10.2.2.2.3.1  Speech  duration.  The  determination  of  whether  end-of- 
speech  should  be  declared  due  to  the  speech  duration  test  rests  upon 
three  dependent  conditions,  Figure  7.  First  the  fact  that  speech 
has  begun  must  be  ascertained.  Having  determined  that  speech  has 
begun,  bands  of  silence  of  a  minimum  of  60  centisecond  duration  are 
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TABLE  IX 

ENERGY  PEAK  TABLE 


Current 

Mode 

Condition 

MAX 

LTIME 

MODE 

es(L) <  MAX 

es(L) 

L 

* 

Search 

for 

MAX  ^  es(L)  <  (MAX*12)/I0 

* 

* 

* 

Valley 

(MAX*12)/10^es(L) 

es(L) 

L 

Peak 

es(L)>  MAX  es(L) 

L 

k 

Search 

for 

MAX  >  es(L)  >  (MAX*10)/12  * 

* 

* 

Peak 

(MAX*10)/12  5s  es(L)  es(L) 

L 

Valle; 

^Unchanged 
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sought.  Then,  to  determine  if  a  given  band  of  silence  is  indeed  at 
the  end-of-speech,  the  past  history  of  the  maximum  filter  energy 
function  is  investigated  to  determine  if  valid  speech  preceded  the 
silence.  If  any  one  of  these  conditions  is  not  met,  processing 
shall  proceed  to  section  10.2.2.2.3.2  with  the  speech  duration  test 
having  failed.  If  all  conditions  are  met,  end-of-speech  shall  be 
declared  and  processing  shall  proceed  as  indicated  previously.  The 
determination  of  whether  these  conditions  have  been  made  shall  be  as 
follows . 


10.2.2.2.3.1.1  Speech  start.  Speech  is  recognized  as  having 
started  by  the  fact  that  e(t),  the  maximum  filter  energy  function, 
has  attained  a  value  of  100  or  more  at  some  point  in  time.  If 
speech  start  has  already  been  recognized  as  per  this  test, 
processing  shall  proceed  to  10.2.2.2.3.1.2.  If  speech  start  has  not 
yet  been  ascertained  but  at  this  time  L,  e(L)>100  ,  speech  start 
shall  be  declared.  In  doing  so,  speech  shall  be  set  to  started  and 
EPEAK  shall  be  set  to  this  value  of  e(L)  as  a  first  step  in 
maintaining  EPEAK  as  the  maximum  of  e(t)  for  t<L.  Irrespective  of 
whether  speech  start  is  declared  at  this  time,  processing  shall 
proceed  to  10.2.2.2.3.2. 

10.2.2.2.3.1.2  Speech  silence.  Speech  silence  is  said  to  occur 
when  the  maximum  energy  function  e(t)  falls  below  the  speech  silence 
threshold  EPEAK/8.  As  EPEAK  is  to  be  the  maximum  of  e(t)  for  t  <  L, 
EPEAK  shall  be  updated  to  the  current  value  of  e(L)  if  e(L)  >  EPEAK. 
If  at  any  time  e(L)  is  seen  to  fall  below  the  current  value  of  the 
silence  threshold,  the  time  at  the  previous  interval  (i.e.  L-l) 
shall  be  recorded  as  I<2  and  a  count  shall  be  started  to  count  the 
number  of  consecutive  maximum  filter  energy  samples  that  remain 
below  this  threshold.  Whenever  a  span  of  speech  silence  is 
encountered  which  is  60  centiseconds  or  more  in  length  (i.e.  the 
count  becomes  greater  than  or  equal  to  60),  speech  silence  of 
sufficient  length  is  said  to  exist  and  processing  shall  proceed  to 
valid  speech,  10.2.2.2.3.1.3.  If  at  the  current  time,  speech 
silence  does  not  exist  or  is  of  insufficient  length,  processing 
shall  proceed  to  10.2.2.2.3.2. 

10.2.2.2.3.1.3  Valid  speech.  To  preclude  the  possibility  of 
accepting  as  speech  some  accidental  noise  (such  as  is  caused  by  the 
microphone  being  bumped),  speech  is  only  said  to  be  valid  if  before 
speech  silence,  a  span  of  maximum  filter  energy  can  be  found  which 
lasts  for  at  least  100  centiseconds  and  contains  no  speech  silence 
intervals  longer  than  59  centiseconds.  Hence,  once  a  band  of  speech 
silence  is  located  as  per  10.2.2.2.3.1.2,  the  e(t)  function  for 

t  <  L-j  (where  is  the  time  before  the  speech  silence)  shall  be 
investigated  in  the  following  manner.  Letting  Lj  =  Lj  -1,  -2,... 

values  of  e(L()  shall  be  compared  with  EPEAK/8.  If  for  some  L|  , 
e(L,)  <  EPEAK/8,  a  count  shall  be  started  to  count  the  number  of 
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consecutive  e(L|)  samples  of  speech  silence.  If  any  such  count 
reaches  60,  valid  speech  is  not  said  to  exist  in  which  case  SPEECH 
shall  be  reset  to  not-started  and  processing  shall  proceed  to 
maximum  speech  duration,  10.2.2.2.3.2.  If,  however,  the  current 
value  of  e(L|)>EPEAK/8,  the  time  length,  L^~  L(  ,  shall  be  compared 
to  100.  If  -  L |  >  100,  valid  speech  exists  and  end-of-speech 
shall  be  declared  with  processing  continuing  as  has  been  indicated 
in  10.2.2.2.3.  If  L^-  L|  <  100,  no  determination  as  to  valid  speech 
can  be  made  and  previous  values  of  e(Lt)  shall  be  further 
investigated  in  the  described  manner.  If  no  such  determination  can 
be  made  by  the  time  Lt  has  been  decremented  to  L| =1 ,  valid  speech  is 
said  not  to  exist,  in  which  case  SPEECH  shall  be  reset  to  not- 
started  and  processing  shall  proceed  to  10.2.2.2.3.2. 

10.2.2.2.3.2  Maximum  speech  duration.  No  speech  processing  shall 
continue  for  longer  than  400  centiseconds  of  speech  data.  If  the 
amount  of  time  since  the  end  of  prompting  has  reached  400 
centiseconds  (i.e.,  if  L  =  400),  end-of-speech  is  summarily  declared 
and  speech  processing  shall  cease. 

10.2.2.3  Suitable  data.  Once  end-of-speech  is  declared,  the 
results  of  the  speech  processing  shall  be  investigated  so  as  to 
determine  if  the  data  is  suitable  for  use  in  creating  initial 
reference  data.  If  the  data  is  deemed  suitable,  the  selection  of 
four  peaks  as  per  10.2.2.4  shall  be  implemented.  If  the  data  is 
deemed  not  suitable,  all  data  resulting  from  the  processing  of  the 
current  phrase  shall  be  discarded,  this  same  phrase  shall  be 
reprompted  as  the  next  phrase,  and  processing  shall  be  resumed  with 
phrase  prompt  initialization,  10.2.2.1.  This  procedure  postpones 
the  prompting  of  the  next  phrase  in  the  list  created  at  the  start  of 
enrollment,  10.2.1.d,  until  speech  processing  has  yielded  data 
deemed  suitable  by  this  test.  There  shall  be  no  limit  to  the  number 
of  times  a  given  phrase  may  be  repromptcd  in  this  manner. 

Two  conditions  may  arise,  either  of  which  is  sufficient  to  declare 
the  data  not  suitable.  The  first  condition  is  a  filter  overload. 

If  the  filters  yielded  an  overload  status  condition,  then  in 
addition  to  the  above,  the  procedure  10.2.2.3.1  shall  be 
implemented.  If  no  filter  overload  occurred  but  fewer  than  four 
peaks  were  found  by  10.2.2.2.2,  i.e.,  NPK  <  4,  then  the  second 
condition  has  been  met  and  the  gain  adjustment  test  of  10.2.2.3.2 
shall  be  implemented,  where  mis- registered  shall  be  defined  as 
having  found  no  more  than  two  peaks,  i.e.,  NPK  <  2. 

10.2.2.3.1  Filter  overloads.  Whenever  a  filter  overload  status  is 
sensed,  two  messages  shall  be  sent  to  the  preprocessor.  The  first 
shall  be  to  cause  the  filters  to  be  reset.  The  second  shall  be  to 
reduce  the  filter  gain,  G,  by  a  factor  of  two  to  a  level  no  less 
than  one  (i.e.  if  G=l,  no  further  gain  reduction  is  possible). 
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10.2.2.3.2  Filter  gain  adjustment.  Whenever  a  new  phrase  is  to  be 
prompted  and  no  filter  overload  has  occurred,  the  following  test 
shall  be  performed  to  determine  if  the  filter  gain  G  shall  be 
adjusted  or  if  the  instruction  "louder  please"  shall  be  prompted  to 
the  enrollee.  This  decision  is  based  on  the  value  EPEAK,  the 
maximum  value  of  e(L),  10.2.2.2.1.1.d,  used  to  determine  when  end- 
of-speech  was  declared. 

a.  If  EPEAK  >  1200,  G  shall  be  decreased  by  a  factor  of  two  to 
a  level  no  less  than  one. 

b.  If  EPEAK  <  1200  and  if  the  phrase  did  not  mis-register  then 

(1)  If  EPEAK/G  <  150  and  the  instruction  "louder  please" 
has  not  yet  been  prompted  during  the  entry  attempt,  "louder  please" 
shall  be  prompted.  ("louder  please"  shall  be  prompted  no  more  than 
once  per  entry  attempt.) 

(2)  If  EPEAK  <  400  and  "louder  please"  is  not  to  be 
prompted  at  this  time,  G  shall  be  increased  by  a  factor  of  two  to  a 
level  no  greater  than  the  maximum  filter  gain. 

10.2.2.4  Selection  of  four  peaks.  If  the  speech  processing 
described  in  yielded  data  deemed  suitable  as  per  10.2.2.3,  then  the 
peak  data  saved  as  per  10.2.2.2.2  shall  be  considered  so  as  to  find 
four  peaks  representative  of  the  four  prompted  words  in  the  phrase. 
Table  X  presents  values  of  Atem(„)  (initial  expected  time  intervals 
between  word  m  prompted  as  the  nth  word  in  the  phrase  and  whatever 
word  was  prompted  next)  and  of  ETHy^(^  (the  expected  relative  energy 
of  word  m,  the  nth  word  in  the  phrase).  For  the  particular  words 
used  in  the  phrase  just  prompted,  the  appropriate  values  of  Ate„,tnj 
and  ETHmW  shall  be  used.  The  peaks  of  10.2.2.2.2  were  saved  in 
arrays  tp(j)  and  ep(j)  where  1  <  j  <  NPK.  From  these  NPK  peaks  all 
possible  combinations  of  selecting  four  peaks  j(  ,  j^,  j3  and  j^  such 
that  1  £  j,  <  <  j3  <  j^  <  NPK  shall  be  considered.  If  the 

combination  (or  sequence)  currently  under  consideration  meets  the 
following  time  separation  criterion: 

AteM(n>  /2  <  tp(jn#lJ  )  -  tp(jn)  <  2  *  Atem(^ 

for  all  n  =  1,  2,  and  3  then  the  sequence  error  for  the  combination 
shall  be  computed  as  follows: 

4 

eseq  =  ST  +  SERR 
n=l 
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TABLE  X 

EXPECTED  TIME  INTERVALS  AND  RELATIVE  ENERGIES 


m 

4tem 

ETH, 

1 

40 

42 

2 

43 

50 

3 

46 

19 

4 

44 

45 

5 

43 

33 

6 

48 

20 

7 

48 

41 

8 

46 

29 

9 

42 

29 

10 

43 

32 

11 

44 

23 

12 

52 

40 

13 

20 

14 

22 

15 

41 

16 

17 

202 


SPECIFICATION  NUMBER 
BISS-ENC-140Q0 
15  May  1980 


Where 


ST 


tp^n+l 


)  "  tp(jn)  -Atew^ 


i 


2/16 


X 


n  “ 


SERRh  =  X*  /Ilf 

|l6*[l28  *  ep(jn)  -  ETHw(w)*ETOT]j  /ETOT 


(rounded) 


(d.p.) 

(d.p.,  rounded) 


4 

and  ETOT  =  ^  ~  cp(ji») 


If  eseq  can  be  computed  and  if  it  is  less  than  the  current  value  of 
BESTSQ  (initialized  as  per  10.2.2.  l.c),  then  BESTSQ  shall  be  set  to 
eseq  and  the  current  sequence  shall  be  saved  as  the  optimum  thus  far 
i.e.,  OPTTn  =  tp(jn)  and  OPTSE*  =  ep(jn)  for  n  =  1,  2,  3,  4.  If 
after  all  such  possible  combinations  have  been  so  considered  and  if 
BESTSQ  resulted  in  a  value  greater  than  625,  all  data  resulting  from 
the  processing  of  this  phrase  shall  be  discarded,  the  filter  gain 
adjustment  test  as  per  10.2.2.3.2  shall  be  implemented,  where  this 
phrase  shall  be  deemed  not  mis-registered ,  this  same  phrase  shall  be 
reprompted  and  the  processing  shall  be  repeated  starting  as 
described  in  10.2.2.1.  There  shall  be  no  limit  to  the  number  of 
times  this  phrase  shall  be  reprompted  in  order  to  obtain  an 
acceptable  optimum  sequence.  If  an  acceptable  optimum  sequence  is 
found,  i.e.,  if  BESTSQ  <  625,  the  procedure  of  10.2.2.5  shall  be 
implemented. 

10.2.2.5  Creating  initial  reference  file  data.  Once  an  acceptable 
optimum  sequence  has  been  obtained  as  per  10.2.2.4,  the  initial 
reference  file  data  shall  be  generated  as  follows.  Using  the 
optimum  sequence  data  0PTTn  and  0PTSEn  for  n  =  1,  2,  3,  4,  then,  for 
the  four  words  used  in  the  phrase  just  prompted: 

a.  The  pattern  data  for  word  m(n)  shall  be  saved  as  follows: 

srm(n>(i,k)  =  ^(i,0PTT„  -  7  +  2k) 

for  i  =  1 ,  2, ...  14 
k  =  1,  2,.. .6 
n  =  1,  2,  3,  4 

and  where  1^(i ,t)  is  the  quantized  data  generated  at  time  t  for 
filter  i  as  per  10.2.2.2.1.3. 

b.  The  expected  time  intervals  shall  be  saved  as 

^Trefvn(n)  =  0PTT*+|  -  0PTTn 
for  n  “  1,  2,  3 
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c.  The  expected  scanning  error  shall  be  set  to  zero: 

sESE  =  0 

d.  The  number  of  times  the  reference  data  has  been  calculated 

shall  be  set  to  one:  NTC^^j  =  1 

With  this  complete,  the  test  of  whether  generation  of  initial 
reference  data  is  complete  shall  be  performed  as  per  10.2.2.6. 

10.2.2.6  Initial  reference  data  complete.  The  generation  of 
initial  reference  data  is  not  said  to  be  complete  until  data  has 
been  computed  for  each  of  the  possible  16  words  at  least  one  time. 
Therefore,  if  NTC^C  1  for  any  m  =  1,  2,... 16  the  generation  of 
initial  reference  file  data,  10.2.2,  shall  continue.  If,  however, 
NTCm>  1  for  all  m  =  1,2,...  16,  a  complete  set  of  initial  reference 
data  is  available  and  the  procedures  of  10.2.3  shall  be  implemented. 
In  either  case,  another  phrase  will  be  prompted  and  as  such,  the 
gain  adjustment  test  as  per  10.2.2.3.2  shall  be  implemented,  where 
this  phrase  shall  be  deemed  not  mis-registered.  Upon  completion  of 
the  test,  the  appropriate  procedure,  10.2.2  or  10.2.3,  shall  be 
implemented  where  the  phrase  to  be  prompted  shall  be  the  next  phrase 
in  the  list  created  as  per  10.2. l.d.  If  all  phrases  in  this  list 
have  been  used,  a  new  list  shall  be  created  as  per  10.1.3.3  where 
the  first  phrase  in  this  new  list  shall  be  that  which  is  prompted. 

10.2.3  Refinement  of  reference  file  data.  The  refinement  of 
reference  file  data  consists  of  the  processing  of  phrases  containing 
the  16  words  such  that  each  word  is  processed  at  least  four  times. 
The  data  for  each  processed  word  is  then  compared  with  the  already 
existing  reference  file  data  and  if  comparable,  is  used  to  create  an 
average  which  becomes  the  reference  file  data  used  in  subsequent 
verification  attempts.  If  the  processed  phrase  data  is  not 
comparable  to  the  already  existing  reference  file  data,  a  re- 
intialization  of  the  reference  file  data  may  instead  become 
necessary  and  the  refinement  process  continued  until  the  words 
involved  have  had  their  reference  file  data  updated  the  required 
four  times.  To  perform  the  refinement  process,  it  is  necessary  that 
reference  file  data  already  exist  for  all  of  the  possible  16  words. 
To  this  end,  the  test  of  10.2.2.6  shall  be  passed  prior  to  starting 
the  refinement  process.  The  refinement  process  shall  be  as  follows. 

10.2.3.1  Phrase  prompt  initialization.  Prior  to  the  processing  of 
speech  data  for  the  phrase  selected  for  prompting  by  the  previous 
processing,  10.2.2.6  or  10.2.3.5,  the  following  initialization  shall 
be  implemented. 

a.  The  parameters  MIN*,,  and  LTIMEm,  and  MODE*,  used  to  locate 
valley  points  in  the  scanning  error  functions  for  the  four  words 
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quantized  dataq^i.L)  and  the  maximum  filter  energy  e(L)  for  1*1, 
2,... 14  corresponding  to  the  14  filters  and  corresponding  to  L,  the 
time  in  centiseconds  since  the  end  of  prompting. 

10.2.3.2.2  Scanning  error.  For  each  centisecond  following  the  end 
of  prompting,  the  scanning  error  centered  at  time  L  -  7  for  each  of 
the  four  words  prompted  (n  =  1,  2,  3,  4)  shall  be  defined  as 
follows: 

6  14  2 

SE„(L-7)  =  E  T  Xik 
k=l  i=l 


where  X;j<  =  r^^Ci.k)  for  L+2k-l4<0 

=  rm(nj (i ,k)  -  L+2k-l4)  for  L+2K-14>0 

r^ln)(i,k)  is  the  reference  pattern  data,  ando^(i,t)  is  the 
quantized  filter  data  at  time  t. 

10.2.3.2.3  Optimum  sequence  of  scanning  error  valley  points.  To 
select  an  optimum  sequence  of  scanning  error  valley  points,  the 
valley  points,  i.e.,  local  minima,  of  each  of  the  four  scanning 
errors  functions  must  be  found  and  then,  selecting  one  point  per 
function,  a  sequence  is  formed.  That  sequence  will  be  considered 
further  if  the  time  intervals  between  valley  points  compares 
favorably  with  the  expected  time  intervals  ATrefm  as  contained  in 
the  reference  data.  For  those  sequences  which  do  compare  well,  a 
sequence  error  is  computed  and  that  sequence  yielding  the  minimum 
error  when  end-of-speech  is  declared  is  called  the  optimum  sequence. 
The  procedure  in  locating  this  optimum  sequence  shall  be  as  follows, 
see  Figure  8. 

10.2.3.2.3.1  Locate  valley  points.  For  the  current  time  L,  each  of 
the  four  scanning  errors  SEn(L  -  7)  centered  at  time  L  -  7  shall  be 
considered  in  turn.  For  this,  the  nth,  scanning  error  function,  a 
valley  locating  mode  MODEy,  and  values  for  MINn  and  LTIMEy,  will  have 
been  determined  at  time  L  -  1,  or  during  phrase  prompt 
initialization  if  L  =  1.  Based  on  MODEn  for  this  function  and  the 
value  of  SEn(L  -  7),  the  values  to  which  MIN,,,  LTIMEM  and  M0DEn 
shall  be  updated  to  be  used  for  this  function  at  the  next 
centisecond,  at  L  +  1,  shall  be  as  presented  in  Table  XI.  As  an 
example,  if  SEn(L  -  7)  is  greater  than  or  equal  to  the  current  value 
of  MINn  and  the  current  mode  is  search-for-peak,  MINn  shall  be  set 
to  SEn(L  -  7),  LTIMEy,  shall  be  set  to  L  -  7  and  the  mode  for  the 
next  time,  at  L  +  1,  shall  be  search-for-peak. 

If  the  current  mode  is  search-for-valley  and  SEn(L  -  7)  is  greater 
than  or  equal  to  (MIN  *15) /_! 0 ,  then  a  valley  point  is  said  to  be 
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shall  be  initialized  respectively  to  900,  zero  and  search-for-peak 
for  all  n  =  1,  2,  3,  4. 

b.  The  parameters  EPEAK  and  SPEECH  used  to  find  the  end-of- 
speech  shall  be  set  to  zero  and  not-started  respectively. 

c.  The  parameter  BESTSQ  used  to  locate  an  optimum  sequence  of 
valley  points  shall  be  initialized  to  the  largest  positive  integer 
possible  for  the  computer. 

d.  For  the  words  in  the  phrase  selected  for  prompting  at  this 
time,  the  average  reference  data  shall  be  computed  as  follows  where 
NTCtn(n)  is  the  number  of  times  data  has  been  collected  for  word  m  to 
be  prompted  as  the  nth  word  in  the  phrase.  The  data  used  in  the 
averaging,  srm^  and  s A  Tref m(n) ,  are  as  defined  in  10.2.2.5  if 
NTCw(n)  =  1,  or  are  as  defined  in  10.2.3.4  if  NTCy^f^^  1. 

(1)  The  reference  pattern  data  for  word  m  shall  be 
averaged  by: 

rwUvqCi.k)  =  srwW(i,k)/NTCw(w)  (rounded) 
for  i  =  1,  2,... 14,  and  k  =  1,  2,... 6 

(2)  The  expected  time  interval  between  word  m  and  any  next 
word  shall  be  averaged  by: 

ATrefm(n)  =  sATrefmW>/NTCmly^  (rounded) 

e.  If  the  phrase  to  be  prompted  at  this  time  is  not  a  reprompt 
due  to  the  decisions  of  previous  processing,  as  is  the  case  when 
refinement  of  reference  file  data  first  begins,  the  parameter  NOTREG 
used  to  count  misregistered  phrases  shall  be  set  to  zero.  In  the 
case  where  the  current  phrase  is  a  reprompt,  NOTREG  shall  remain 
unchanged . 

f.  The  phrase  selected  by  the  previous  processing  shall  be 
prompted  through  the  prompting  unit. 

The  processing  of  10.2.3.2  shall  be  implemented,  when  this 
initialization  is  complete. 

10.2.3.2  Speech  processing.  For  each  centisecond  following  the  end 
of  prompting,  while  the  enrollee  is  repeating  the  prompted  phrase 
and  until  end  of  speech  is  declared,  10.2.3.2.4,  the  procedures  of 
this  section  shall  be  implemented. 

10.2.3.2.1  Data  Compression.  The  process  of  data  compression  as 
applied  to  the  filtered  data  shall  be  the  same  as  that  which  is 
described  in  10.2.2.2.1.  The  output  of  this  process  is  the 
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A, 8, C  VALLEY  POINT  ACCEPTED  ®  OPTIMUM  SEOUENCE 

NOTE.  TSMIN3{2)  IS  ACCEPTED  BEFORE  TSMIN2(I)  BECAUSE  8>C 


Figure  8.  OPTIMUM  SEQUENCE  OF  VALLEY  POINTS 
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TABLE  XI 

VALLEY  POINT  TABLE 


Current 

Mode 

Condition 

MIN 

LTIME 

MODE 

SEn(L-7)^MIN„ 

SEn(L-7) 

L-7 

* 

Search 

for 

MIN„>SEh(L-7)>  (MIN  *10)/15 

* 

* 

* 

Peak 

(MIN  *10)/15  ^  SEn(L-7) 

SE„(L-7) 

L-7 

Valley 

MINh>SE„(L-7)  SEn(L-7) 

L-7  * 

Search 

for 

(MIN  *15)/10  >  SEn(L-7)>  MIN„  * 

*  * 

Valley 

SEn(L-7)Js  (MINn*15)/10  SE„(L-7) 

L-7  Peak 

*Unchanged 
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located.  In  this  case,  prior  to  the  updating  indicated  in  the  table 
and  then  only  if  the  unupdated  value  of  MINn  is  less  than  or  tal 
to  600  and  the  unupdated  value  of  LTIME^  is  greater  than  sev.  . 
centiseconds ,  the  valley  point  shall  be  saved.  If  the  valley  point 
is  to  be  saved,  the  unupdated  values  of  MINyj  and  LTIMEn  shall  *  e 
stored  in  arrays  ESM1N«  and  TSMINn  respectively  where  up  to  fjve 
such  valley  points  shall  be  provided  for  each  of  the  four  scanning 
error  functions.  If  more  than  five  valley  points  are  found  for  a 
given  function,  these  arrays  shall  be  used  circularly  whereby  the 
oldest  valley  point  is  discarded  in  favor  of  the  newest.  If  a 
valley  point  was  saved  at  this  time  for*  this,  the  nth,  scanning 
error  function,  the  procedures  of  10.2.3.2.3.2  shall  be  implemented 
next.  If  a  valley  point  was  not  saved  at  this  time  and  if  n  <  3, 
the  valley  locating  procedures  of  this  section  shall  be  implemented 
next  for  the  (n+l)th  scanning  error  function.  If,  however,  n=4  and 
no  valley  point  was  saved  at  this  time,  the  end  of  speech  test  as 
per  10.2.3.2.4,  or  10.3.2.3  if  the  verification  algorithm  is  being 
implemented,  shall  be  implemented  next. 

10.2.3.2.3.2  Finding  t*  ^  optimum  sequence.  If  the  valley  point  was 
saved  as  per  10.2.3.2.3.1  at  the  current  time  for  the  scanning  error 
function  currently  under  consideration  (i.e.,  SEn)  it  shall  be 
subjected  to  tests  to  determine  if  it  is  to  become  a  part  of  a 
sequence.  If  it  is  and  if  the  sequence  can  be  formed  at  this  time, 
the  sequence  is  tested  to  determine  if  it  is  the  optimum  sequence 
found  thus  far.  By  always  retaining  only  the  sequence  which  is  the 
optimum  that  has  been  found  thus  far  the  sequence  which  exists  when 
end-of-speech  is  declared  is  that  sequence  which  becomes  known  as 
the  optimum  for  the  phrase. 

In  the  following,  the  term  couplet  is  used  to  denote  a  pair  of 
valley  points,  each  consisting  of  a  time  and  scanning  error,  from 
consecutive  scanning  error  functions,  e.g.,  SEj  and  SEj+|,  since  the 
valley  points  have  been  saved  in  circular  arrays  TMSIN^  and  ESMINj, 
jj  is  used  as  an  index  to  select  one  of  the  valley  points  of  SEj 
saved  in  these  arrays.  Hence,  1  <  I  <  4  but  1<  jr  <5.  The  testing 
procedures  are  as  indicated  below. 

10.2.3.2.3.2.1  Successor  couplets.  Successor  couplets  shall  be 
formed  using  the  just  saved  valley  point  of  SErt  and  valley  points 
already  saved  for  SE*+| .  If  no  valley  points  have  been  saved  for  SEn+j 
or  if  n=4  whereby  SEj  does  not  exist,  processing  shall  proceed 
immediately  to  10.2.3.2.3.2.2.  If  n  <  4  and  valley  points  have  been 
saved  for  SEn+j ,  the  couplets  shall  be  formed  starting  with  the  most 
recently  saved  valley  point  and  progressing  to  the  oldest  saved 
valley  point  of  SEM+(.  As  each  such  couplet  is  formed  it  shall  be 
tested  according  to  10.2.3.2.3.2.1.1  and  10.2.3.2.3.2.1.2  where  I=n 
in  these  tests.  When  all  such  couplets  have  been  so  tested, 
processing  shall  proceed  to  10.2.3.2.3.2.2. 
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10.2.3.2.3.2.1.1  Time  restriction  test  Using  the  couplet  selected 
by  previous  processing,  the  following  test  shall  be  performed.  If 

ATref«n(I)  /2  <  AT  <  2*  i^Tref^jj 

where  ATref^jj  is  the  reference  data  expected  time 
interval  between  word  m,  prompted  as 
the  Ith  word  in  the  phrase,  and 
whatever  word  is  prompted  next, 

AT  =  TSMINJ+(  (jlH  )  -  TSMINj  ( jj  ) 

^1*1  *  are  t*ie  int^ces  t0  t^le  valley  points 
selected  for  this  couplet, 
and  TSMINj-^j  ( )  and  TSMINj(jj)  are  the  times  of 
these  valley  points, 

then  the  couplet  is  said  to  have  passed  the  time  restriction  test. 
If  the  couplet  currently  under  consideration  passes  this  test, 
processing  shall  proceed  to  10.2.3.2.3.2.1.1.  If  the  couplet  did 
not  pass  this  test,  processing  shall  proceed  as  in  10.2.3.2.3.2.1, 
or  10.2.3.2.3.2.2  if  predecessor  couplets  are  being  tested. 


10.2.3.2.3.2.1.2  Point  pair  test.  Using  the  couplet  which  passed 
the  time  restriction  test,  a  point  pair  error  EW  shall  be  computed 
as  follows: 

EW  =  |{[(MXD  +  4  *|aT  -ATrefw(I)|)  *  (ESMINj  (jj)  +  1)]  /MXd] 

*  (ESMINI+|  (j1  +  |  )  +  l)j  /2048  (d.p.) 

where  AT, ATrefm(£) ,  jj  and  are  as  defined  above, 

ESMINj  (jj)  and  ESMINj  +  |  (jjr+|  )  are  the  scanning  errors  of  the 
valley  points  selected  for  this  couplet,  and 
MXD  is  the  maximum  of  80  and  4  *ATrefm^J). 

If  this  couplet  produces  an  EW  greater  than  or  equal  to  70,  the 
couplet  is  said  to  have  failed  the  point  pair  test  and  processing 
shall  continue  as  indicated  in  10.2.3.2.3.2.1,  or  10.2.3.2.3.2.2  for 
predecessor  couplets.  If  this  couplet  produces  an  EW  less  than  70, 
the  couplet  along  with  its  EW  shall  be  saved  as  follows: 


PPTj  (kj,  1)  =  TSMINj  (jj) 
PPTj  (kj,  2)  =  TSMINj+j  (jUJ  ) 
PPEj  (kj,  1)  =  ESMINj  (jt) 

PPEj  (kj,  2)  =  ESMINj+l  (JI+() 
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PPEWj  (kj)  =  EW 

where  provision  shall  be  made  for  the  saving  of  data  for  up  to  five 
such  couplets  (1  <  kj  <  5)  for  1=1,2,  3.  If  more  than  five 
couplets  are  found  which  pass  these  tests  for  any  I,  the  associated 
arrays  shall  be  used  circularly  whereby  the  oldest  couplet  data  is 
discarded  in  favor  of  the  newest.  Having  saved  the  couplet  data, 
processing  shall  continue  as  in  10.2.3.2.3.2.1,  or  for  prodecessor 
couplets,  10.2.3.2.3.2.2. 

10.2.3.2.3.2.2  Predecessor  couplets.  Prodecessor  couplets  shall  be 
formed,  tested  and  saved  as  appropriate  in  the  same  manner  as  it  was 
done  for  successor  couplets.  Predecessor  couplets  consist  of  valley 
points  already  saved  for  SEn_|  and  the  just  saved  valley  point  of 
SEn.  If  no  valley  points  have  been  saved  for  SEn_|  or  if  n=l 
whereby  SE0  does  not  exist,  processing  shall  proceed  in  accordance 
to  10.2.3.2.3.2.3.  If  n  >  1  and  valley  points  have  been  saved  for 
SEn_jthese  couplets  shall  be  formed  starting  with  the  most  recently 
saved  valley  point  and  progressing  to  the  oldest  saved  valley  point 
of  SE„_| .  As  each  such  couplet  is  formed,  it  shall  be  tested  as  in 
to  10.2.3.2.3.2.1.1  and  10.2.3.2.3.2.1.2  where  I=N-1  in  these  tests. 
When  all  such  couplets  have  been  so  tested,  processing  shall  proceed 
to  10.2.3.2.3.2.3. 

10.2.3.2.3.2.3  Forming  sequence.  A  chain  of  couplets  is  created  by 
finding  k, ,  k^  and  k^  such  that 

PPTj  (kj,  2)  =  PPT1+J  (kj+|,  1) 

PPEj[  (kj,  2)  =  PPEI+J  (kI+|  ,  1) 

for  I  =  1  and  I  =  2 

where  the  PPT  and  PPE  arrays  are  as  defined  in  10.2,3.2.3.2.1.2. 

For  each  such  created  chain  which  includes  both  predecessor  (if  N  > 
1)  and  successor  (if  N  <  4)  couplets  for  the  just  saved  valley  point 
of  SE^,  a  sequence  error,  eseq,  shall  be  computed  as  follows: 

3 

eseq  = PPEWT(kT) 

1=1  1  1 

Where  the  kj's  are  the  indices  of  the  chain  and  PPEW  is  as 
defined  in  10.2.3.2.3.2.1.2. 

If  this  value  of  eseq  is  less  than  the  sequence  errors  for  all 
previous  chains  (i.e.,  if  eseq  <  BESTSQ) ,  then  BESTSQ  shall  be  set 
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to  eseq  and  the  valley  points  of  the  chain  shall  be  saved  as 
follows : 

OPTSEj  =  PP£j(kj,  1) 

OPTTj  =  PPTj_  (kj,  1) 
for  1=1,  2,  and  3 
and  0PTSE,j  =  PPE  3  (k3,  2) 

OPTT^  =  PPT3  (k3,  2) 

These  values  of  OPTTj  (  for  1=1,  2,  3  and  4)  shall  be  further 
adjusted  based  on  the  values  of  the  smoothed  energy  function  es(t) 
around  these  times.  Letting  successively  tj=  OPTTj-  1,  tj=  OPTTj  and 
tj=  0PTTj+  1,  the  smoothed  energies  shall  be  computed  as  follows: 

+6 

es(tr)  =  y~  e(tr  +  k) 
k=-6  2 

where  e(t)  is  the  maximum  filter  energy  at  time  t  of  10 .2.2.2. 1 . 1 .d. 

That  value  of  tj  which  yields  the  maximum  value  of  es(tj)  shall  be 
used  as  the  reported  value  of  OPTTj-.  The  values  of  OPTSEj  shall  not 
be  effected  by  this  adjustment. 

When  each  such  chain  involving  the  just  saved  valley  point  has  been 
so  tested,  or  if  no  such  chain  could  be  created,  processing  shall 
proceed  with  the  valley  locating  procedures  of  10.2.3.2.3.1  if 
n  <  4.  If  n=4,  i.e.,  the  just  saved  valley  point  was  from  SE y, 
processing  shall  instead  proceed  to  the  end-of-speech  test  given 
below,  or  10.3.2.3  if  the  verification  algorithm  is  being 
implemented . 

10.2.3.2.4  End-of-speech .  The  test  for  determining  when  end-of- 
speech  has  occurred  during  the  refinement  of  reference  data  shall  be 
implemented  in  the  same  manner  as  that  described  in  10.2.2.2.3.  If 
end-of-speech  is  declared,  no  more  processing  of  filter  data  is 
required  and  the  data  collected  shall  be  tested  for  suitability  as 
per  10.2.3.3.  If  end-of-speech  is  not  declared,  speech  processing 
shall  continue  as  per  10.2.3.2. 

10.2.3.3  Suitable  data.  Once  end-of-speech  is  declared,  the  data 
resulting  from  this  processing  shall  be  deemed  to  be  suitable  or  not 
in  the  following  manner.  If  no  filter  overload  status  condition  was 
sensed  and  if  an  optimum  sequence  was  found,  i.e.,  BESTSQ  was 
changed  from  its  initial  value  of  the  largest  positive  integer,  then 
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the  data  shall  be  deemed  suitable  and  the  reference  update  procedure 
of  10.2.3.4  shall  be  implemented.  If,  however,  a  filter  overload 
occurred,  the  data  resulting  frr.i.i  »  he  speech  processing  shall  be 
discarded,  the  filter  overload  procedures  shall  be  implemented  and 
this  phrase  shall  be  reprompted  as  the  next  phrase  with  processing 
restarting  as  per  10.2.3.1.  There  shall  be  no  limit  to  the  number 
of  times  a  phrase  is  reprompted  in  this  manner  due  to  filter 
overloads.  If  the  filters  did  not  overload  but  instead  the  phrase 
"mis-registered",  i.e.,  an  optimum  sequence  was  not  found,  the  data 
is  unsuitable  but  subsequent  processing  is  dependent  on  the  number 
of  times  this  phrase  has  mis-registered.  In  this  case  the  parameter 
NOTREG  shall  be  incremented  by  one.  If  NOTREG  becomes  equal  to  one, 
i.e.  this  is  the  first  time  the  phrase  mis-registered,  the  data 
resulting  from  the  speech  processing  shall  be  discarded,  the  filter 
gain  adjustment  procedures  of  10.2.2.3.2  shall  be  implemented  and 
this  phrase  shall  be  reprompted  as  the  next  phrase  with  processing 
restarting.  If,  however,  NOTREG  becomes  two,  i.e.  this  is  the 
second  time  this  same  phrase  has  mis-registered,  the  data 
compression  results  (e(L)  and ^(i ,L))  from  10.2.3.2.1  shall  be  used 
to  re-initialize  the  reference  data,  10.2.2.1  through  10.2.2.6  but 
excluding  10.2.2.1.d,  no  phrase  shall  be  prompted,  and  all  sections 
of  10.2.2.2.1,  the  compressed  data  is  already  available. 

10.2.3.4  Reference  data  update.  For  that  data  deemed  suitable  as 
per  10.2.3.3  the  following  reference  data  updating  shall  be 
performed.  The  reference  data  to  be  updated  are  those  values  of 
sr^fi.k),  sATref*,,  sESE*,  and  NTCW  as  created  by  10.2.2.5  and 
possibly  previously  updated  by  this  section.  For  only  those  words 
used  in  the  prompting  of  this  phrase,  the  corresponding  NTCm  will 
determine  what  updating  is  to  be  performed.  If  for  word  m,  prompted 
as  the  nth  word  in  the  phrase,  the  value  of  NTC^  equals  32,  no 
updating  shall  be  performed.  If,  however,  NTCm  is  less  than  32, 
the  following  updating  shall  be  performed  using  the  optimum  sequence 
data,  0PTSEn  and  0PTTn,  n  =  1,  2,  3,  4,  defined  in  10.2.3.2.3.2.3. 

a.  Reference  pattern  data 

srmlnj  (i’k)  =  srw(»)  (i’k)  OPTT n  -  7  +  2k) 

for  i  =  1,  2,. ..14,  k  =  1,  2, ...6  and  where  41(i ,t)  is  the  data 
generated  by  10.2.3.2.1. 

b.  Expected  time  interval  reference  data 

«*Tref  mW)  =  sATref*^  +  0PTTn+,  -  0PTT* 

c.  Expected  scanning  error  reference  data 

sESE^(^=  sESEyn(r>)+  OPTSEflif  NTC^^<  4 
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If  NTC  vY^»yj  >  4,  no  updating  of  the  expected  scanning  error  data 

shall  be  performed. 

d.  Data  collection  count 

NTCmin)  =  NTCtfUyO  +  1 

When  all  indicated  updating  is  complete,  the  procedures  of  10.2.3.5 
shall  be  implemented. 

10.2.3.5  Refinement  or  termination.  When  updating  for  this  phrase 
is  complete,  a  determination  is  made  as  to  whether  further  reference 
data  refinement  is  required  or  whether  the  enrollment  process  shall 
be  terminated.  If  each  of  the  possible  16  words  has  a  data 
collection  count  NTC^  which  is  greater  than  or  equal  to  five,  the 
enrollment  process  shall  be  terminated  and  the  procedures  of  10.2.4 
implemented.  If,  however,  one  or  more  of  the  NTCy^  is  less  than 
five,  further  refinement  shall  be  required.  As  this  will  entail  the 
prompting  of  another  phrase,  the  filter  gain  adjustment  procedures 
of  10.2.2.3.2  shall  be  implemented  where  the  current  phrase  is  said 
to  be  not  mis-registered.  Once  this  is  complete,  the  processing 
starting  at  10.2.3.1  shall  be  repeated  where  the  phrase  to  be 
prompted  shall  be  the  next  phrase  in  the  list.  If  all  32  phrases 
have  already  been  prompted,  a  new  list  shall  be  created  in 
accordance  to  10.1.3.3  and  the  first  phrase  in  this  new  list  shall 
be  that  which  is  prompted. 

10.2.4  Enrollment  termination.  When  the  enrollment  process  is 
deemed  complete  as  per  10.2.3.5,  the  data  to  be  stored  on  the 
reference  file  associated  with  the  individual  through  the  coded 
badge,  personnel  code  and  authorization  file  shall  be  computed.  The 
data  used  in  this  process  shall  be  that  which  resulted  from  the  last 
updating  performed  for  each  of  the  16  words.  The  procedure  shall  be 
as  follows: 

a.  Reference  pattern  data 

rw(i,k)  =  srnj(i,k)/NTCw  (rounded) 


for  i  =  1 

,  2,.. 

.  14 

k  -  1 

,  2,.. 

.6 

m  =  1 

,  2,.. 

.  16 

These  averages  shall  be  stored  on  the  individual  reference  file 
providing  enough  storage  such  that  updating  during  subsequent 
verifications  shall  permit  significance  to  the  nearest  1/32,  i.e. 
five  fractional  bits  shall  be  provided  for  each  pattern  data  point 
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although  the  data  at  this  point  shall  be  rounded  to  the  nearest 
integer. 

b.  Time  interval  reference  data 
ATref  =  sATre^JJ/NTCm 

for  m  =  1 ,  2 , ...  12 

These  averages  shall  be  computed  and  rounded  to  the  nearest  0.1. 

The  results  of  these  averages  shall  be  stored  on  the  individual 
reference  file. 

c.  Expected  scanning  error  reference  data 

ESEfy,  =  sESE^ 
for  m  =  1 ,  2, ...  16 

These  averages  shall  be  computed  and  rounded  to  the  nearest  0.1. 
Since  these  averages  tend  to  be  underestimated,  the  resultant  ESEyyj 
shall  be  further  scaled  by 

ESEfn-  (m-'ESEroVlOO  (d.p. ) 

for  m  =  1 ,  2 , . . . 16 

and  limited  to  a  value  no  less  than  110.  These  scaled  and  limited 
averages  shall  also  be  computed  and  rounded  to  the  nearest  0.1 
before  being  stored  on  the  individual  reference  file. 

When  all  reference  data  has  been  thus  computed  and  stored,  the 
enrollment  process  is  complete  and  the  instruction  "thank  you"  shall 
be  prompted  to  the  individual  or  enrollee. 

10.2.5  Enrollment  interruptions.  The  enrollment  process  shall  be 
interruptable  in  either  of  the  two  following  ways.  If  at  any  time 
during  speech  processing  until  end-of-speech  is  declared,  the 
individual  sends  a  "reprompt"  signal  to  the  processor,  the  current 
speech  processing  shall  be  halted,  all  intermediate  results  computed 
for  the  phrase  shall  be  discarded,  the  same  phrase  shall  be 
reprompted  and  the  phrase  processing  repeated.  There  shall  be  no 
limit  to  the  number  of  times  an  individual  may  thus  request  a 
reprompt.  If  at  any  time  during  the  enrollment  procedure  the 
operator  sends  a  "stop"  signal  to  the  processor,  the  enrollment 
process  shall  be  terminated  immediately  and  no  data  shall  be  stored. 
The  individual  shall  be  informed  of  this  premature  termination  by  an 
appropriate  prompted  instruction  such  as  "call  for  assistance". 
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10.3  Verification  algorithm.  The  verification  algorithm  consists 
of  three  phases:  (l)  initialization,  (2)  phrase  processing  and  (3) 
decision  making.  The  procedure  involves  the  prompting  of  a  phrase, 
the  processing  of  the  filtered  data  resulting  from  the  entrant 
repeating  the  phrase  and,  if  the  data  is  suitable,  the  computing  of 
a  score  which  measures  the  closeness  of  this  data  to  the  reference 
file  data.  If  the  score  reflects  a  desired  degree  of  closeness,  the 
entrant  is  said  to  be  verified.  If  this  is  not  the  case,  more 
phrase  prompting  may  be  allowed  until  a  verified  decision  is  made  or 
until  the  entrant  is  said  to  be  not  verified  due  to  exhausting  the 
number  of  promptings  allowed.  Figure  9  is  a  flowchart  of  the 
verification  algorithm.  The  notation  glossary  of  Table  VI  will  be 
adhered  to  in  the  following  description. 

10.3.1  Verification  initialization.  The  verification  function 
shall  begin  by  the  acceptance  of  an  entrant's  ID  number  from  the 
pedestrian  booth  through  the  authorization  file.  If  the  entrant  is 
authorized  entry  to,  or  exit  from  the  restricted  area,  the  following 
initialization  shall  be  implemented. 

a.  The  filter  gain  G  which  has  possible  values  1,  2,  4..., 
up  to  a  maximum  filter  gain  shall  be  initialized  to  two  and  the 
filter  overload  indicator  shall  be  cleared. 

b.  The  reference  file  data  as  initially  created,  10.2.4,  or 
as  last  updated,  10.3.3.4.3,  shall  be  retrieved. 

c.  A  list  of  eight  phrases,  two  groups,  of  random  order 
shall  be  created  as  per  10.1.3.3.  The  first  group  shall  be  called 
the  "initial  group";  the  second  group,  the  "auto-abort  group".  When 
processing  begins  as  per  10.3.2,  it  shall  be  the  first  phrase  in  the 
"initial  group"  which  shall  be  prompted. 

d.  The  parameters  IREG,  N0TREG,  EHAT  and  EUSE  used  in 
determining  the  verification  decision  shall  be  set  to  zero. 

e.  If  the  entrant  has  performed  the  verification  process  a 
sufficient  number  of  times  such  that  at  least  four  resultant 
decisions  have  been  "entrant  verified",  the  mode  of  the  verification 
algorithm  shall  be  said  to  be  "normal".  If  the  entrant  has  not 
successfully  verified  four  times,  the  algorithm  mode  shall  be  said 
to  be  "post-enrollment"  or  PE. 

With  this  initialization  complete,  processing  shall  begin  in 
accordance  to  10.3.2. 

10.3.2  Verification  phrase  processing.  The  processing  of  phrases 
during  verification  is  identical  to  that  of  the  enrollment  speech 
processing  during  the  refinement  phase,  10.2.3.2,  with  the 
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VERIFICATION  INITIALIZATION 
10.3.1 


PHRASE  PROMPT  INITIALIZATION 
10  3.2.1 


DATA  COMPRESSION 
10.2  3  2.1 


SCANNING  ERROR 
10.2.3  2  2 


OPTIMUM  SEQUENCE  OF 
scanning  error  valley  POINTS 
10  2  3.2.3 


END-OF  -SPEECH 
10  2.2  2  3,10.3^3, 


VERIFICATION  DECISION 
10  3  3 


REGISTERED  PHRASE 
103  3  I 


SAVING  DATA 
10331  I 


VERIFICATION  SCORING 
103  31  2 


ENTRANT  VERIFIEO 
10  3  3  I  3 


PHRASE  MIS-REGISTERED 
103.3.3 


NOT  TOO  MANY 
MIS-REGISTEREO 


TOO  MANY 
MIS- REGISTERED 


Filter  OVERLOAD 
10.3  3  2 


''NOT  VERIFIED  OUE  TIT* 
TOO  MANY  PHRASES 


DONE  (FAIL) 


PHRASE  DID  NOT  PASS  NO  PHRASES  AUTO-ABORT  ROT  DONE 

_ 10.3  3.1 4 _  REMAIN  10  3  34  |TET 

1  PHRASES  REMAIN  ALREADY  ABORTED  ( 

/O,  fDONE(FAIL)) 


Figure  9.  VERIFICATION  ALGORITHM 
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exceptions  that  the  phrase  prompt  initialization  differs  and  the 
end-of-speech  test  involves  one  more  step.  Verification  phrase 
processing  shall  be  implemented  as  follows. 

10.3.2.1  Phrase  prompt  initialization.  The  processing  of  speech 
data  for  the  phrase  selected  for  prompting,  10.3.1  or  10.3.3,  shall 
be  preceded  by  the  following  initialization: 

a.  The  parameters  MINjj,  LTIMEn,  and  MODEy)  used  to  locate 
valley  points  in  the  scanning  error  functions  for  each  of  the  four 
prompted  words  shall  be  initialized  respectively  to  900,  zero  and 
search-for-peak  for  all  n  =  1,  2,  3,  4. 

b.  The  parameters  EPEAK,  SPEECH,  and  MAXCUT  used  to  find  the 
end  of  speech  shall  be  set  to  zero,  '’not-started"  and  400 
respectively. 

c.  The  parameter  BESTSQ  used  to  locate  the  optimum  sequence  of 
valley  points  shall  be  initialized  to  the  largest  positive  integer 
possible  for  the  computer. 

d.  For  the  four  words  in  the  phrase  selected  for  prompting  at 
this  time,  the  following  summed  expected  scanning  error  shall  be 
computed : 


4 

ESEm(n) 

where  ESE^,^  is  the  reference  expected  scanning  error  for  word  m 
prompted  as  the  nth  word  in  the  phrase.  This  summed  value  shall  be 
computed  maintaining  significance  to  the  nearest  0.1  but  the 
resultant  sum  shall  be  rounded  to  the  nearest  integer.  For  use  only 
during  verification  phrase  processing,  i.e.  all  sections  of  10.3.2, 
the  other  reference  data  (rw(i,k)  and^Trefw)  for  the  words  in  the 
phrase  to  be  prompted  shall  be  rounded  to  the  nearest  integer. 

e.  The  phrase  selected  by  the  previous  processing  shall  be 
prompted  through  the  prompting  unit. 

When  this  initialization  is  complete,  the  processing  of  10.3.2.2 
shall  be  implemented. 

10.3.2.2  Speech  processing.  For  each  centisecond  following  the  end 
of  prompting,  while  the  entrant  is  repeating  the  prompted  phrase, 
and  until  end-of-speech  is  declared,  the  speech  processing 
procedures,  10.2.3.2,  shall  be  implemented  using  the  end-of-speech 
test  which  follows,  10.3.2.3,  rather  than  that  of  10.2.3.2.4. 


sESE  =  21 
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10.3.2.3  End-of-speech.  If  less  than  10  centiseconds  has  elapsed 
since  the  end  of  prompting,  the  end-of-speech  test  shall  not  be 
implemented  and  speech  processing  shall  continue  as  per  10.3.2.2. 

If  however  L  >  10,  the  test  for  determining  when  end-of-speech  has 
occurred  shall  be  implemented  in  the  same  manner  as  that  described 
in  10.2.2.2.3  with  the  following  addition.  If  end-of-speech  was  not 
declared  as  per  10.2.2.2.3.1  and  10.2.2.2.3.2,  the  following 
additional  test  shall  be  implemented.  If  an  optimum  sequence  was 
found  at  this  time  as  per  10.2.3.2.3.2,  the  value  of  MAXCUT  shall  be 
recomputed  by: 


MAXCUT  =  OPTT.  +  7  +  T^Tref^Crn 

n=l 

where  OPTTj  is  the  time  of  the  first  valley  point  in  the  optimum 
sequence  and 

ATref^j^  are  the  expected  time  intervals  between  the  prompted  words. 

If  the  resultant  value  of  MAXCUT  is  greater  than  400,  it  shall  be 
limited  to  400.  If  no  optimum  sequence  was  saved  at  this  time, 

MAXCUT  shall  be  left  at  its  previous  value. 

If  L,  the  total  elapsed  time  since  the  end  of  prompting,  is  greater 
than  or  equal  to  the  current  value  of  MAXCUT  and  if  the  maximum 
filter  energy  at  this  time,  e(L)  of  10.2.2.2.1.1,  is  less  than 
EPEAK/8,  with  EPEAK  as  given  in  10.2.2.2.3,  then  end-of-speech  is 
declared.  If  end-of-speech  has  not  been  declared  as  per  this  test 
or  those  of  10.2.2.2.3,  speech  processing  shall  continue  as 
described  in  10.3.2.2.  If  end-of-speech  is  declared  by  any  one  of 
these  tests,  speech  processing  shall  be  stopped  and  the  decison 
making  procedures  of  10.3.3  shall  be  implemented. 

10.3.3  Verification  decision.  A  verification  decision  that  the 
entrant  has  successfully  verified  against  the  reference  file  data 
can  only  be  made  on  those  phrases  which  are  deemed  suitable. 

In  what  follows,  a  phrase  will  be  called  "mis-registered"  if  an 
optimum  sequence  could  not  be  found  as  per  10.2.3.2.3.2.  If  an 
optimum  sequence  could  be  found,  the  phrase  will  be  called 
registered  or  not  mis-registered. 

If  a  filter  overload  condition  occurred  at  any  time  during  the 
speech  processing,  the  data  is  deemed  not  suitable  and  the 
procedures  of  10.3.3.2  shall  be  implemented.  If  no  filter  overload 
occurred  but  the  phrase  mis-registered,  the  data  is  also  not 
suitable  and  the  procedures  of  10.3.3.3  shall  be  implemented.  If 
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neither  of  these  two  conditions  occurred,  the  data  is  suitable  and 
the  procedures  of  10.3.3.1  shall  be  implemented. 

10.3.3.1  Phrase  registered.  For  those  phrases  which  registered, 
the  following  shall  be  performed. 

10.3.3.1.1  Saving  data.  For  the  purpose  of  the  possible  updating 
of  reference  file  data  and  the  computation  of  a  decision  score,  the 
following  shall  be  implemented.  The  count  IREG  of  registered 
phrases  shall  be  incremented  by  one.  The  words  used  in  prompting 
the  current  phrase,  the  optimum  sequence  scanning  errors,  the  time 
intervals  between  optimum  sequence  valley  points,  and  the  quantized 
filter  data  centered  on  these  valley  points  shall  be  saved  as 
follows: 


uwdIfi£G(n) 

useIREG,(n) 

udtIR6G(n) 

uptnIK£6(i» 


=  m(n)  for  n  =  1,  2,  3,  4 
=  0PTSEn  for  n  =  1,  2,  3,  4 
=  OPTT^,-  0PTTh  for  n  =  1,  2,  3,  and 
k,  n)  =  'Vj(i,  OPTTn  -  7  +  2k)  for 


i=  1,  2, ...14;  k  =  1,  2, ...6;  n  =  1,  2,  3,  4; 


where  m(n)  is  the  word  m  prompted  as  the  nth  word  in  the  phrase, 
0PTSEn  and  0PTTn  are  defined  in  10.2.3.2.3.2.3  and  4)(i,t)  is  as 
defined  in  10.2.2.2.1.3. 


Additionally  the  parameters  EHAT  and  EUSE  shall  be  updated  as 
follows: 


EHAT  =  EHAT  +  sESE 
4 

EUSE  =  EUSE  +  T  0PTSE 
n=l 


where  sESE  is  the  summed  expected  scanning  error  for  the  phrase  and 
OPTSEj,  is  as  above. 

Once  the  data  has  been  saved  in  this  manner,  the  verification 
scoring  procedures  shall  be  implemented. 

10.3.3.1.2  Verification  scoring.  A  decision  score  PDEC  used  to 
determine  whether  the  entrant  was  verified  on  this  phrase  shall  be 
computed  as  follows: 

PDEC  =  100*EUSE/x  (d.p.,  rounded) 
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where  x  is  the  minimum  of  the  product  560*IREG  and  the  value  of 
'e'  is  the  maximum  of  the  product  400*IREG  and  the  value  of  EHAT,  and 
EUSE,  IREG  and  EHAT  are  as  per  10.3.3.1.1. 

Based  on  the  mode  of  the  algorithm,  normal  or  PE,  and  the  group  of 
phrases  to  which  the  current  phrase  belongs,  initial  or  autoabort, 
the  threshold  corresponding  to  the  current  value  of  IREG  as  given  in 
Table  XII  shall  be  used.  If  PDEC  is  less  than  or  equal  to  this 
threshold,  the  entrant  is  said  to  have  successfully  verified  and  the 
procedures  of  10.3.3.1.3  shall  be  implemented.  If  PDEC  is  greater 
than  this  threshold,  the  phrase  did  not  pass  and  the  procedures 
given  in  10.3.3.1.4  shall  be  implemented. 

10.3.3.1.3  Entrant  verified.  Only  when  an  entrant  successfully 
verifies  as  per  10.3.3.1.2  will  the  reference  file  data  be  updated. 
Using  of =  2  if  the  algorithm  mode  is  normal  or  o(=  8  if  the 
algorithm  mode  is  PE,  the  following  reference  file  updating  shall  be 
implemented.  For  each  word  m  =  uwd: (n)  where  j  =  1,  2,...  IREG  and 
n  =  1,  2,  3,  4: 

rwi^*k)  =  (32-«0*rm(i,k)  +<<*uptnj  (i,k,n)/32 
for  i=l,2,...14  and  k=l,2,...6 

ATrefm=  (32-<*)*ATref m  +  << *udtj  (n)/32  (d.p.) 

ESEm=  (32-«)*ESEm+0(*usej  (n)/32  (d.p.) 

where  uptnj ,  udtj  ,  and  usej  are  the  data  for  registered  phrase  j , 
saved  as  per  10.3.3.1.1  and  rm,  ATrefrt  and  ESEm  are  the  reference 
file  data  for  word  m  with  all  fractional  information  maintained. 

Once  the  reference  file  data  has  been  updated  (where  rm  shall  be 
rounded  to  the  nearest  1/32  and ATrefpj and  ESE^  shall  be  rounded  to 
the  nearest  1/10),  the  verification  algorithm  is  complete  and  the 
instruction  "thank  you"  shall  be  prompted  to  the  entrant. 

10.3.3.1.4  Phrase  did  not  pass.  If  the  phrase  did  not  pass  the 
verification  scoring  as  per  10.3.3.1.2,  the  entrant  may  or  may  not 
be  said  to  have  not  verified.  A  check  shall  be  made  to  determine  if 
more  phrases  remain  in  the  list  of  phrases  for  the  current  group, 
initial  or  auto-abort.  If  there  are  more  phrases,  the  verified/not 
verified  decision  shall  not  be  made.  Instead,  the  filter  gain 
adjustment  procedure  of  10.2.2.3.2  shall  be  implemented  and  the 
processing  shall  continue  as  per  10.3.2  where  the  phrase  to  be 
prompted  shall  be  the  next  phrase  in  the  current  group's  list  of 
phrases.  If  no  more  phrases  remain  in  the  current  group's  list,  the 
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TABLE  XII 

DECISION  SCORE  THRESHOLDS 


Mode  of  Algorithm 


Phrase  Group 

IREG 

Post-Enrollment 

Normal 

1 

0 

100 

Initial 

2 

0 

120 

3 

0 

135 

4 

145 

145 

1 

0 

85 

Auto 

2 

0 

110 

Abort 

3 

0 

130 

4 

145 

145 
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auto-abort  procedure  of  10.3.3.4  shall  instead  be  implemented  to 
determine  if  a  not  verified  decision  is  to  be  made. 

10.3.3.2  Filter  overload.  If  a  filter  overload  occurred  during  the 
verificaiton  phrase  processing  of  10.3.2,  the  filter  overload 
procedures  of  10.2.2.3.1  shall  be  implemented.  Additionally,  the 
current  phrase  shall  be  appended  to  the  end  of  the  list  of  phrases 
in  the  current  group  where  it  may  or  may  not  be  reprompted  as 
determined  by  subsequent  decision  strategy.  With  the  appending  of 
this  phrase,  the  procedure  of  10.3.3.2.1  shall  be  implemented  to 
determine  if  the  entrant  shall  be  deemed  not  verified  due  to  too 
many  phrases  being  prompted. 

10.3.3.2.1  Not  verified  due  to  too  many  phrases.  Whenever  a  phrase 
is  appended  to  the  end  of  the  list  of  phrases  for  the  current  group, 
a  check  shall  be  made  as  to  whether  this  list  has  become  too  long. 

If  the  total  number  of  phrases  in  the  current  group’s  list  has 
reached  20  in  number,  including  those  phrases  already  prompted  from 
the  group,  the  verification  process  shall  be  halted  and  the  decision 
shall  be  that  the  entrant  is  "not  verified  due  to  too  many  phrases”. 
The  reference  file  data  shall  not  be  updated  in  this  case.  If, 
however,  the  list  is  less  than  20  in  length,  the  filter  gain 
adjustment  procedure  shall  be  performed  as  per  10.2.2.3.2  and  the 
verification  process  shall  continue  from  10.3.2  where  the  phrase  to 
be  prompted  shall  be  the  next  phrase  in  the  current  group's  list. 

10.3.3.3  Phrase  mis-registered .  For  each  group  of  phrases  used  in 
the  verification  process,  the  verification  algorithm  permits  a 
maximum  of  one  mis-registered  phrase  in  normal  mode  or  two  mis- 
registered  phrases  in  PE  mode.  With  the  current  phrase  being  deemed 
mis-registered  as  per  10.3.3,  N0TREG  shall  be  incremented  by  one  and 
if  NOTREG  then  exceeds  this  maximum  for  the  group  of  phrases  and  for 
the  mode  selected  for  the  algorithm  in  10.3.1,  the  auto-abort 
procedures  of  10.3.3.4  shall  be  implemented.  If  NOTREG  remains  less 
than  or  equal  to  this  maximum,  then  this  phrase  shall  be  appended  to 
the  end  of  the  list  of  phrases  for  the  current  group  and  the  "not 
verified  due  to  too  many  phrases"  procedure  as  per  10.3.3.2.1  shall 
be  implemented. 

10.3.3.4  Auto-abort.  Whenever  all  four  phrases  of  a  group  have 
gone  through  the  phrase  registered  process,  10.3.3.1,  without  an 
entrant  verified  decision  being  made  or  whenever  the  maximum  number 
of  mis-registered  phrases  is  exceeded  as  per  10.3.3.3,  the 
procedures  given  herein  shall  be  implemented.  If  the  current  group 
of  phrases  is  the  first  or  initial  group,  auto-abort  will  not  have 
yet  been  implemented  but  shall  be  at  this  time.  If,  however,  the 
current  group  of  phrases  is  the  second  or  auto-abort  group,  auto¬ 
abort  will  have  already  been  implemented  once  and,  since  the  entrant 
shall  be  permitted  only  two_  groups  of  phrases  per  verification 


223 


SPECIFICATION  NUMBER 
BISS-ENC- 14000 
15  May  1980 


attempt,  the  entrant  shall  be  said  to  be  not  verified.  If  the  not 
verified  decision  is  made  at  this  time,  the  verification  procedure 
shall  be  halted  and  the  reference  file  data  shall  not  be  updated. 

If  the  decision  is  that  auto-abort  is  to  be  implemented  at  this 
time,  the  following  shall  be  performed. 

a.  Any  updating  data  saved  as  per  10.3.3.1.1  shall  be 
discarded. 

b.  The  parameters  IREG,  N0TREG,  EHAT,  and  EUSE  shall  be  reset 
to  zero. 

c.  The  filter  gain  adjustment  procedure  of  10.2.2.3.2  shall  be 
implemented. 

d.  The  second  or  auto-abort  group  of  phrases  shall  be  selected 
for  usage  in  subsequent  processing. 

When  this  is  complete,  the  processing  shall  resume,  10.3.2,  where 
the  first  phrase  in  this  second  group  of  phrases  shall  be  that 
phrase  which  is  prompted. 

10.4  Training  mode.  The  training  mode  is  a  special  mode  of 
enrollment  used..to  acquaint  the  entrant  with  the  enrollment  process 
before  the  actual  enrollment  occurs.  The  training  mode  shall 
consist  of  that  processing  performed  to  create  initial  reference 
data  during  enrollment  with  the  exceptions  that  only  two  groups  of 
phrases  shall  be  used  in  the  processing  and  that  the  actual  creation 
of  initial  reference  data  shall  not  be  implemented.  With  these 
exceptions,  the  procedures  of  10.2.1  through  10.2.2.4  shall  be 
implemented.  Throughout  this  processing,  the  enrollment 
interruptions  given  in  10.2.5  shall  be  permitted. 

10.5  Validation  Mode.  The  validation  mode  is  a  special  mode  of 
verification  used  to  check  out  the  state  of  the  reference  file  data 
after  enrollment,  to  assist  an  entrant  who  is  exhibiting  unusual 
difficulty.  10.3  shall  apply  to  this  mode  except  that  the  updating 
of  10.2.3.4  shall  not  be  implemented.  The  count  of  the  number  of 
times  the  entrant  has  successfully  verified,  used  to  determine  the 
verification  algorithm  mode,  shall  not  be  affected  by  this 
procedure. 
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MISSION 

of 

Rome  Air  Development  Center 

RA VC  plan*  and  executes  research,  de.vetopme.nt,  test  and 
selected  acquisition  programs  in  support  of  Command,  Control 
Communications  and  Intelligence  lC3j)  activities.  Technical 
and  engineering  support  within  areas  of  technical  competence 
is  provided  to  ESP  Program  Offices  IPOs)  and  other  ESP 
elements.  The  principal  technical  mission  areas  are 
communications,  electromagnetic  guidance  and  control,  sur¬ 
veillance  of  ground  and  aerospace  objects,  intelligence  data 
collection  and  handling,  information  system  technology, 
ionospheric  propagation,  solid  state  sciences,  microwave 
physics  and  electronic  reliability,  maintainability  and 
compatibility. 


